First, create a new project by following the steps illustrated in figure 1. In DoMosaics, data files are uniquely associated with a project. There are three types of data (sequence data, domain data, phylogenetic (tree) data). Each of these three data types can be loaded into the program separately. A typical analysis starts with a set of amino acid sequences in fasta format. To load a fasta file, follow the path illustrated in figures 1 and 2 below. Once the file is loaded, the view will be selected and displayed.
You can also add a sequences to an existing project. Once the file is loaded, you will see a sequence view. Note: sequence views provide a way for you to keep associated data together, and serve as an entry point for running annotation or doing dotplots. They do not provide any context action.
Once we have the sequence view, we can annotate domains. Using the Shortcut in the button bar, we can launch a local hmmscan job (see Figure 4). In order to run hmmscan, we have to download the appropriate binaries from here. Next, extract the zip or gzip and remember the location of the folder. Provide DoMosaics with the hmmscan binary (found in the binaries folder of the downloaded, extracted hmmer folder) and the hmmpress binary (Figure 4:2). Then we have to provide the model files in appropriate format (Figure 4:2). In this example, we used Pfam models (which can be downloaded from the Pfam FTP site. Choose the Pfam-A.hmm file. We can provide extra parameters to the scan (Figure 4:3) - see the main documentation for more information on running context-dependant annotation. Finally, we can start the scan.
Alternatively, we can run InterProScan by pressing the IPR icon in the button bar (Figure 5:1) or by choosing 'Actions' -> 'Domain View' -> 'InterProScan'. In this dialog (see Figure 5), we will select the created sequence view (Figure 5:2) 'sequence data'. Note that we could also choose a fasta file from our local filesystem. Next, in order to use EBI webservices, we have to enter a valid email (Figure 5:2). We can then choose the scan method from the drop-down menu (Figure 5:3)- in this example, we will scan search against Pfam-defined models (hmmpfam). We are now ready to start the annotation process; click 'Submit Job' (Figure 5:3). The console will display information while the scan is underway.
Once we have created a domain view (which might correspond to a protein family of interest) we can find similar arrangements using RADS (Rapid Alignment using Domain Strings) - see the main documentation for more information on RADS. To run a RADS scan, choose the RADS icon in the button bar (Figure 6a:1). Run with default parameters by clicking on 'Submit job' (6a:2). Once the scan is complete, select from the result lists by clicking on import selection (which requires at least one selected arrangement from the results table). A RADS scan can also be started through a right-click on an arrangement which is to be used as a query (not on a domain). Select 'RadScan this arrangement' (Figure 6b). When opening RADS from the context of an arrangement, the arrangement is displayed at the top of the panel, and the data selection controls are disabled. Note that RADS takes one data set (sequence or arrangement data) - providing a view with multiple arrangements will result in RADS selecting one only. If a sequence is provided, RADS will first annotate the arrangement of the query, and then perform the search.
(Fig6: click on 1 open frame 2 and after submitting, the "show result" button enable to open a frame 3 to select which sequences to integrate into a new domain view).
Once the search is complete (be it HMMER, InterProScan or RADS) and a view name has been entered/a project selected, we will see a new domain view. To find out more about a given domain, hover over it with the mouse. A tool tip will display positional information, Evalue, source DB as well as GO annotation if available. Right-click on a domain to open the context menu. Here, you can change visual parameters (shape / color) for all domains of this type in the current project, hide the domain, apply domain sequence comparison, etc (Figure 7). When manually aligning domains, consider using the function 'Domain sequence comparison' to determine to which column a given domain should be added.
Domains can be represented by various shapes, colors, un-proportional and/or aligned. All these actions are triggered from the domain view menu. A right-click on a protein opens a context menu which provides access to various operations (e.g. edit arrangement, open fasta etc).
The tool item in the domain view menu offers a variety of analysis tools such as the domain graph, the domain dotplot or a tool which allows you to play with e-value thresholds, domain overlaps and co-occurring domains. For more information on these tools, visit the main documentation (section tools).
As mentioned above, the third data type (and view type) is the tree view. A tree view can be created based on views, or based on an external file which is imported into DoMosaics. DoMosaics can create trees based on sequence (by either using a sequence view, or the sequences which underlie a domain view), or based on the distance between arrangements (see Figure 10). The latter case constructs a distance matrix based on the Jaccard index or the domain edit distance (which is in essence equivalent to the Levenshtein distance). The sequence-based tree construction requires an internet connection (as it aligns the sequences remotely using an EBI service for ClustalW). For more information visit the main documentation.
Once a tree is created, you can manipulate it in the tree view using contextual menus which are available on a right-click on parts of the tree. Note that nodes have context menu. A number of general settings (such as used fonts, displayed values, etc) are available through the main tree view menu (under view, see figure 11).
The domain-tree view is a composite view which is created by merging a tree view and a corresponding domain view. While a domain-tree view is a separate view type, its contextual menus which are specific to one of the two views continue to work, and the main menu is sub-sectioned to include both menus. A number of operations are specific to this view type, such as the operation for computing possible insertions and deletions for domains in the view (available under "View", "Show Insertions/Deletions").
The DoMosaics settings are available by clicking on the cog-wheel icon in the button bar. Here, you can define which email is to be used for scans, default binaries and models for local HMMER scans, set up URLS which are used for contextual look-up operations, define your workspace and set up default options for closing and saving etc (see Figure 14).
When you close DoMosaics it will store all of your projects. When you load DoMosaics, it will restore the state it had last. You can save projects and views in DoMosaics format (xml-based), or export your datasets to fasta, xdom or newick via the file menu of each view.