MSS: Whole Genome Alignment using a
Mutation Sensitive Approach

We perform experiments to test the effectiveness of MSS. This page gives the results of the experiments. It includes

  1. Input Details
  2. Performance of MSS
  3. Comparisons with MUMmer-3 and MaxMinCluster.

Input Details

We have used MSS to align different genomes:

  • Human and mouse chromosomes at DNA level.
  • Intra-genus Baculoviridae at translated protein level.
  • Inter-genus Baculoviridae at translated protein level.
For mouse and human chromosomes, we only extract the contigs that contain conserved genes for experiments. Details about the conserved genes can be found in the NCBI homepage. Note that the same inputs are used when we test the performance of MUMmer-3 and MaxMinCluster.

For virus genomes, we use the complete genomes for experiments. Details about the genomes can be found in the paper "Herniou, E. A., T. Luque, X. Chen, J. M. Vlak, D. Winstanley, J. S. Cory, and D. R. O'Reilly. Use of whole genome sequence data to infer baculovirus phylogeny. Journal of Virology 75(17), 8117-8126".

Note that when we align human/mouse chromosomes, we require the minimum MUM length to be 20 nucleotides, and when we align virus genomes, we require the minimum MUM length to be 3 amino acids. We require a longer MUM length for human/mouse becase they show a much higher degree of similarity than virus genomes. As a side effect, although the human and mouse chromosomes are much longer, the number of MUMs is less than that of virus genomes. The same setting is used for testing other software.

Details of the inputs are given in the tables below:
Table 1. Human and mouse experiment inputs.
ExperimentMouse
chromosome
Human
chromosome
Length # of MUMs# of Conserved Genes
1m02h1551M*53M9,21851
2m07h1922M*31M11,070192
3m14h0327M*51M7,28023
4m14h0839M*18M3,61338
5m15h1265M*38M7,32480
6m15h2265M*28M5,36872
7m16h1662M*25M4,17931
8m16h2133M*30M5,61564
9m16h2262M*26M1,50230
10m17h0640M*61M9,789150
11m17h1614M*28M1,84446
12m17h1930M*39M1,42730
13m18h0573M*57M12,54664
14m19h0924M*59M6,36822
15m19h1129M*14M5,90593

Table 2. Intra-genus Baculoviridae experiment inputs.
ExperimentVirusVirus Length# of MUMs# of Conserved Genes
1AcMNPVBmNPV133K*128K 35,166134
2AcMNPVHaSNPV133K*131K 64,29198
3AcMNPVLdMNPV133K*161K 65,22795
4AcMNPVOpMNPV133K*131K 59,949126
5AcMNPVSeMNPV133K*135K 66,898100
6BmNPVHaSNPV128K*131K 63,93998
7BmNPVLdMNPV128K*161K 63,08693
8BmNPVOpMNPV128K*131K 58,657122
9BmNPVSeMNPV128K*135K 66,44899
10HaSNPVLdMNPV131K*161K 57,61892
11HaSNPVOpMNPV131K*131K 59,12595
12HaSNPVSeMNPV131K*135K 64,980101
13LdMNPVOpMNPV161K*131K 75,90698
14LdMNPVSeMNPV161K*135K 62,545102
15OpMNPVSeMNPV131K*135K 63,261101
16CpGVPxGV123K*100K 59,73397
17CpGVXcGV123K*178K 63,258107
18PxGVXcGV100K*178K 81,02099

Table 3. Inter-genus Baculoviridae experiment inputs.
ExperimentVirusVirusLength# of MUMs# of Conserved Genes
19HaSNPVPxGV131K*100K 49,14667
20HaSNPVXcGV131K*178K 83,71574
21LdMNPVPxGV161K*100K 46,66868
22LdMNPVXcGV161K*178K 75,35077
23OpMNPVPxGV131K*100K 47,90168
24OpMNPVXcGV131K*178K 77,98675
25PxGVSeMNPV100K*135K 50,25368
26SeMNPVXcGV135K*178K 84,15276
27AcMNPVCpGV133K*123K 61,19572
28AcMNPVPxGV133K*100K 50,09368
29AcMNPVXcGV133K*178K 85,44378
30BmNPVCpGV128K*123K 60,70872
31BmNPVPxGV128K*100K 49,83768
32BmNPVXcGV128K*178K 84,11075
33CpGVHaSNPV123K*131K 59,23171
34CpGVLdMNPV123K*161K 57,04575
35CpGVOpMNPV123K*131K 59,71576
36CpGVSeMNPV123K*135K 60,90575

Performance of MSS

For each experiment, we execute MSS with the following commands (by replacing the input with the corresponding sequence files). The number after the "-l" option sets the minimum MUM length, measured in nucleotide level.
  • For Human/Mouse, we use ./mss.sh -D -l 20 human.fasta mouse.fasta.
  • For Virus, we use ./mss.sh -l 9 virus1.fasta virus2.fasta.
We measure the performance of MSS using the following two parameters: coverage and sensitivity.
  • Coverage counts the percentage of published gene pairs for which some MUMs are reported.
  • Sensitivity considers the percentage of reported MUMs that actually reside in a conserved gene pair.

The results are shown in the table below.
Average CoverageAverage Sensitivity
Human/Mouse91%29%
Intra-genus Baculoviridae78%87%
Inter-genus Baculoviridae36%53%
Table 4. Average Coverage and Sensitivity achieved by MSS for human/mouse and virus genomes. The individual coverage and sensitivity for each experiment can be found here: Human/Mouse, Intra-genus Baculoviridae, Inter-genus Baculoviridae.

Due to the memory limitation, MSS cannot handle arbitrarily large inputs. Roughly speaking, for a PC with 1G memory, our software can support experiments with individual genome length not greater than 70M. The bottleneck is the MUM generation step, which requires building the suffix tree for each individual genome.

Comparisons with MUMmer-3 and MaxMinCluster

We compare the performance of MSS with MUMmer-3 and MaxMinCluster, using the same data sets above.

  • For MUMmer-3, we use the default settings as shown in its homepage.
  • For MaxMinCluster, we use the settings as shown in its paper.

We also implement a hybrid approach which first applies MUMmer-3 or MaxMinCluster to identify clusters that are very likely to be conserved genes; these clusters are each treated as an MUM with bigger weight and processed together with the remaining MUMs using MSS. The average coverage and sensitivity of the five software are shown in the table below.
Human/MouseIntra-genus BaculoviridaeInter-genus Baculoviridae
MUMmer-377% (27%)66% (71%)43% (62%)
MaxMinCluster84% (27%)69% (75%)45% (59%)
MSS91% (29%)78% (87%)36% (53%)
MUMmer-3 + MSS91% (28%)79% (75%)48% (43%)
MaxMinCluster + MSS91% (27%)79% (82%)51% (53%)
Table 5. Average Coverage (and Sensitivity) achieved by different software for human/mouse and virus inputs. The individual coverage and sensitivity of each experiment can be found here: Human/Mouse, Intra-genus Baculoviridae, Inter-genus Baculoviridae.