DoMosaics

Quickstart guide


This document is meant as an entry point for new users. We provide an example dataset in the program (which is available under the main menu -> "Help" -> "Load example dataset"). For more detailed information consult the main documentation.
1 - Import new data

First, create a new project by following the steps illustrated in figure 1. In DoMosaics, data files are uniquely associated with a project. There are three types of data (sequence data, domain data, phylogenetic (tree) data). Each of these three data types can be loaded into the program separately. A typical analysis starts with a set of amino acid sequences in fasta format. To load a fasta file, follow the path illustrated in figures 1 and 2 below. Once the file is loaded, the view will be selected and displayed.

Figure 1: Create a new project
Click on button 1. In dialog (2), choose an existing project or create a new one (3). In the appearing dialog (4), enter a name (5) and press finish. Select this newly created project (new folder in dialog 2). Then, click 'Next'.
Figure 2: Add sequence data to new project
Choose 'Sequence' from the three possible data types (1). Click 'Next'. In the Open File dialog of the wizard (2), click on 'Browse', and choose your fasta (4). Finally, click on 'Open' to select the file. Finally, select finish, give the view a name and press finish.


2 - Sequence view

You can also add a sequences to an existing project. Once the file is loaded, you will see a sequence view. Note: sequence views provide a way for you to keep associated data together, and serve as an entry point for running annotation or doing dotplots. They do not provide any context action.

Figure 3
Sequences can also be loaded into a project using the shortcut (see notice on the button bar icons above). Click on the icon at position 3 (a large F) opens a dialog which requires project selection (i.e. the project must exist) and a view name. Note that the view name has to be unique as (3).


3 - Domain annotation

Once we have the sequence view, we can annotate domains. Using the Shortcut in the button bar, we can launch a local hmmscan job (see Figure 4). In order to run hmmscan, we have to download the appropriate binaries from here. Next, extract the zip or gzip and remember the location of the folder. Provide DoMosaics with the hmmscan binary (found in the binaries folder of the downloaded, extracted hmmer folder) and the hmmpress binary (Figure 4:2). Then we have to provide the model files in appropriate format (Figure 4:2). In this example, we used Pfam models (which can be downloaded from the Pfam FTP site. Choose the Pfam-A.hmm file. We can provide extra parameters to the scan (Figure 4:3) - see the main documentation for more information on running context-dependant annotation. Finally, we can start the scan.


Figure 4
Download and extract the HMMER 3.0 suite; hmmscan and hmmpress binaries can be found in the subdirectory 'binaries'. In DoMosaics, add the hmmscan (HMMER3 scan bin), the hmmpress (HMMER3 press bin) and provide a HMM library. Finally, choose a sequence view or a fasta file and start the job.

Alternatively, we can run InterProScan by pressing the IPR icon in the button bar (Figure 5:1) or by choosing 'Actions' -> 'Domain View' -> 'InterProScan'. In this dialog (see Figure 5), we will select the created sequence view (Figure 5:2) 'sequence data'. Note that we could also choose a fasta file from our local filesystem. Next, in order to use EBI webservices, we have to enter a valid email (Figure 5:2). We can then choose the scan method from the drop-down menu (Figure 5:3)- in this example, we will scan search against Pfam-defined models (hmmpfam). We are now ready to start the annotation process; click 'Submit Job' (Figure 5:3). The console will display information while the scan is underway.


Figure 5
InterProScan annotation requires internet and offers a choice of several domain databases. However, scan parameters cannot be adjusted.

Once we have created a domain view (which might correspond to a protein family of interest) we can find similar arrangements using RADS (Rapid Alignment using Domain Strings) - see the main documentation for more information on RADS. To run a RADS scan, choose the RADS icon in the button bar (Figure 6a:1). Run with default parameters by clicking on 'Submit job' (6a:2). Once the scan is complete, select from the result lists by clicking on import selection (which requires at least one selected arrangement from the results table). A RADS scan can also be started through a right-click on an arrangement which is to be used as a query (not on a domain). Select 'RadScan this arrangement' (Figure 6b). When opening RADS from the context of an arrangement, the arrangement is displayed at the top of the panel, and the data selection controls are disabled. Note that RADS takes one data set (sequence or arrangement data) - providing a view with multiple arrangements will result in RADS selecting one only. If a sequence is provided, RADS will first annotate the arrangement of the query, and then perform the search.

Rule of thumb, use the context menu to call RADS when you have a domain view with different multi-domain proteins. Use the main menu, when you have a file or a view with one, or multiple, identical arrangements (e.g. members of a domain family with identical arrangements).

(Fig6: click on 1 open frame 2 and after submitting, the "show result" button enable to open a frame 3 to select which sequences to integrate into a new domain view).

Figure 6a
RADS/RAMPAGE domain-based alignments for homology search against UniProt proteins. Once the scan is complete, results can be selectively imported. We can see the raw scan log by pressing 'Show scan log' (2). We can browse the RADS results online by pressing 'Browse online' (3). Select the results of interest (or 'Select all') and 'Import selection'.
Figure 6b
The RadScan panel was opened from an arrangement, we cannot select a sequence view or a fasta file. If the arrangement is associated with sequences we can choose RADS or RADS/RAMPAGE. For more information on RADS, consult the main documentation.


4 - Arrangement views

Once the search is complete (be it HMMER, InterProScan or RADS) and a view name has been entered/a project selected, we will see a new domain view. To find out more about a given domain, hover over it with the mouse. A tool tip will display positional information, Evalue, source DB as well as GO annotation if available. Right-click on a domain to open the context menu. Here, you can change visual parameters (shape / color) for all domains of this type in the current project, hide the domain, apply domain sequence comparison, etc (Figure 7). When manually aligning domains, consider using the function 'Domain sequence comparison' to determine to which column a given domain should be added.

Note that domain views which are associated with sequences carry a small 's' in their icon. Some tools or context operations require the presence of sequences, such as dotplots or domain sequence comparison.

Figure 7

Domains can be represented by various shapes, colors, un-proportional and/or aligned. All these actions are triggered from the domain view menu. A right-click on a protein opens a context menu which provides access to various operations (e.g. edit arrangement, open fasta etc).

Figure 8


5 - Arrangement tools

The tool item in the domain view menu offers a variety of analysis tools such as the domain graph, the domain dotplot or a tool which allows you to play with e-value thresholds, domain overlaps and co-occurring domains. For more information on these tools, visit the main documentation (section tools).

Figure 9


6 - Build or load trees

As mentioned above, the third data type (and view type) is the tree view. A tree view can be created based on views, or based on an external file which is imported into DoMosaics. DoMosaics can create trees based on sequence (by either using a sequence view, or the sequences which underlie a domain view), or based on the distance between arrangements (see Figure 10). The latter case constructs a distance matrix based on the Jaccard index or the domain edit distance (which is in essence equivalent to the Levenshtein distance). The sequence-based tree construction requires an internet connection (as it aligns the sequences remotely using an EBI service for ClustalW). For more information visit the main documentation.

We strongly recommend you use a dedicated program to create a tree (e.g. PhyML, RaxML etc) and load the resulting Newick tree into DoMosaics.
Figure 10
Create a tree by clicking on the single-colored tree in the button bar (1). In the dialog (2), you can create a tree using either domains (3) or sequences (4). The latter requires an internet connection and uses a ClustalW webservice for alignment (5).


7 - Tree view

Once a tree is created, you can manipulate it in the tree view using contextual menus which are available on a right-click on parts of the tree. Note that nodes have context menu. A number of general settings (such as used fonts, displayed values, etc) are available through the main tree view menu (under view, see figure 11).

Figure 11
Display of bootstrap values, context menu (color and label customisation) and with scaling.


8 - Domain-Tree view

The domain-tree view is a composite view which is created by merging a tree view and a corresponding domain view. While a domain-tree view is a separate view type, its contextual menus which are specific to one of the two views continue to work, and the main menu is sub-sectioned to include both menus. A number of operations are specific to this view type, such as the operation for computing possible insertions and deletions for domains in the view (available under "View", "Show Insertions/Deletions").

Figure 12
Figure 13


9 - Configuration Panel and help

The DoMosaics settings are available by clicking on the cog-wheel icon in the button bar. Here, you can define which email is to be used for scans, default binaries and models for local HMMER scans, set up URLS which are used for contextual look-up operations, define your workspace and set up default options for closing and saving etc (see Figure 14).

Figure 14


10 - Store the days work and continue later

When you close DoMosaics it will store all of your projects. When you load DoMosaics, it will restore the state it had last. You can save projects and views in DoMosaics format (xml-based), or export your datasets to fasta, xdom or newick via the file menu of each view.