We perform experiments to test the effectiveness
of MSS. This page gives the results of
the experiments. It includes
- Input Details
- Performance of MSS
- Comparisons with MUMmer-3 and MaxMinCluster.
We have used MSS to align different genomes:
- Human and mouse chromosomes at DNA level.
- Intra-genus Baculoviridae at translated protein level.
- Inter-genus Baculoviridae at translated protein level.
For mouse and human chromosomes, we only extract the contigs that
contain conserved genes for experiments. Details about the conserved
genes can be found in the
NCBI
homepage. Note that the same inputs are used when we test
the performance of MUMmer-3 and MaxMinCluster.
For virus genomes, we use the complete genomes for experiments.
Details about the genomes can be found in the paper
"Herniou, E. A., T. Luque, X. Chen, J. M. Vlak, D. Winstanley, J. S.
Cory, and D. R. O'Reilly. Use of whole genome sequence
data to infer baculovirus phylogeny. Journal of Virology 75(17),
8117-8126".
Note that when we align human/mouse chromosomes, we require
the minimum MUM length to be 20 nucleotides, and when we
align virus genomes, we require the minimum MUM length to
be 3 amino acids. We require a longer MUM length for
human/mouse becase they show a much higher degree of
similarity than virus genomes.
As a side effect, although the human and mouse chromosomes are much
longer, the number of MUMs is less than that of virus genomes.
The same setting is used for testing other software.
Details of the inputs are given in the tables below:
Table 1. Human and mouse experiment inputs.
Experiment | Mouse chromosome |
Human chromosome | Length |
# of MUMs | # of Conserved Genes |
1 | m02 | h15 | 51M*53M | 9,218 | 51 |
2 | m07 | h19 | 22M*31M | 11,070 | 192 |
3 | m14 | h03 | 27M*51M | 7,280 | 23 |
4 | m14 | h08 | 39M*18M | 3,613 | 38 |
5 | m15 | h12 | 65M*38M | 7,324 | 80 |
6 | m15 | h22 | 65M*28M | 5,368 | 72 |
7 | m16 | h16 | 62M*25M | 4,179 | 31 |
8 | m16 | h21 | 33M*30M | 5,615 | 64 |
9 | m16 | h22 | 62M*26M | 1,502 | 30 |
10 | m17 | h06 | 40M*61M | 9,789 | 150 |
11 | m17 | h16 | 14M*28M | 1,844 | 46 |
12 | m17 | h19 | 30M*39M | 1,427 | 30 |
13 | m18 | h05 | 73M*57M | 12,546 | 64 |
14 | m19 | h09 | 24M*59M | 6,368 | 22 |
15 | m19 | h11 | 29M*14M | 5,905 | 93 |
Table 2. Intra-genus Baculoviridae experiment inputs.
Experiment | Virus | Virus |
Length | # of MUMs | # of Conserved Genes |
1 | AcMNPV | BmNPV | 133K*128K | 35,166 | 134 |
2 | AcMNPV | HaSNPV | 133K*131K | 64,291 | 98 |
3 | AcMNPV | LdMNPV | 133K*161K | 65,227 | 95 |
4 | AcMNPV | OpMNPV | 133K*131K | 59,949 | 126 |
5 | AcMNPV | SeMNPV | 133K*135K | 66,898 | 100 |
6 | BmNPV | HaSNPV | 128K*131K | 63,939 | 98 |
7 | BmNPV | LdMNPV | 128K*161K | 63,086 | 93 |
8 | BmNPV | OpMNPV | 128K*131K | 58,657 | 122 |
9 | BmNPV | SeMNPV | 128K*135K | 66,448 | 99 |
10 | HaSNPV | LdMNPV | 131K*161K | 57,618 | 92 |
11 | HaSNPV | OpMNPV | 131K*131K | 59,125 | 95 |
12 | HaSNPV | SeMNPV | 131K*135K | 64,980 | 101 |
13 | LdMNPV | OpMNPV | 161K*131K | 75,906 | 98 |
14 | LdMNPV | SeMNPV | 161K*135K | 62,545 | 102 |
15 | OpMNPV | SeMNPV | 131K*135K | 63,261 | 101 |
16 | CpGV | PxGV | 123K*100K | 59,733 | 97 |
17 | CpGV | XcGV | 123K*178K | 63,258 | 107 |
18 | PxGV | XcGV | 100K*178K | 81,020 | 99 |
Table 3. Inter-genus Baculoviridae experiment inputs.
Experiment | Virus | Virus | Length | # of MUMs | # of Conserved Genes |
19 | HaSNPV | PxGV | 131K*100K | 49,146 | 67 |
20 | HaSNPV | XcGV | 131K*178K | 83,715 | 74 |
21 | LdMNPV | PxGV | 161K*100K | 46,668 | 68 |
22 | LdMNPV | XcGV | 161K*178K | 75,350 | 77 |
23 | OpMNPV | PxGV | 131K*100K | 47,901 | 68 |
24 | OpMNPV | XcGV | 131K*178K | 77,986 | 75 |
25 | PxGV | SeMNPV | 100K*135K | 50,253 | 68 |
26 | SeMNPV | XcGV | 135K*178K | 84,152 | 76 |
27 | AcMNPV | CpGV | 133K*123K | 61,195 | 72 |
28 | AcMNPV | PxGV | 133K*100K | 50,093 | 68 |
29 | AcMNPV | XcGV | 133K*178K | 85,443 | 78 |
30 | BmNPV | CpGV | 128K*123K | 60,708 | 72 |
31 | BmNPV | PxGV | 128K*100K | 49,837 | 68 |
32 | BmNPV | XcGV | 128K*178K | 84,110 | 75 |
33 | CpGV | HaSNPV | 123K*131K | 59,231 | 71 |
34 | CpGV | LdMNPV | 123K*161K | 57,045 | 75 |
35 | CpGV | OpMNPV | 123K*131K | 59,715 | 76 |
36 | CpGV | SeMNPV | 123K*135K | 60,905 | 75 |
For each experiment, we execute MSS with the following commands
(by replacing the input with the corresponding
sequence files).
The number after the "-l" option sets the minimum MUM length,
measured in nucleotide level.
- For Human/Mouse, we use ./mss.sh -D -l
20 human.fasta mouse.fasta.
- For Virus, we use ./mss.sh -l 9 virus1.fasta
virus2.fasta.
We measure the performance of MSS using the following two
parameters: coverage and sensitivity.
- Coverage counts the percentage of published gene pairs for which
some MUMs are reported.
- Sensitivity considers the percentage of reported MUMs that actually
reside in a conserved gene pair.
The results are shown in the table below.
| Average Coverage | Average Sensitivity |
Human/Mouse | 91% | 29% |
Intra-genus Baculoviridae | 78% | 87% |
Inter-genus Baculoviridae | 36% | 53% |
Due to the memory limitation, MSS cannot handle arbitrarily
large inputs. Roughly speaking, for a PC with 1G memory,
our software can support experiments with individual genome
length not greater than 70M.
The bottleneck is the MUM generation step, which requires
building the suffix tree for each individual genome.
We compare the performance of MSS with MUMmer-3 and MaxMinCluster,
using the same data sets above.
- For MUMmer-3, we use the default settings as shown in its homepage.
- For MaxMinCluster, we use the settings as shown in its paper.
We also implement a hybrid approach which first applies MUMmer-3 or
MaxMinCluster to identify clusters that are very likely to be conserved
genes; these clusters are each treated as an MUM with bigger weight and
processed together with the remaining MUMs using MSS. The average
coverage and sensitivity of the five software are shown
in the table below.
| Human/Mouse | Intra-genus Baculoviridae | Inter-genus Baculoviridae |
MUMmer-3 | 77% (27%) | 66% (71%) | 43% (62%) |
MaxMinCluster | 84% (27%) | 69% (75%) | 45% (59%) |
MSS | 91% (29%) | 78% (87%) | 36% (53%) |
MUMmer-3 + MSS | 91% (28%) | 79% (75%) | 48% (43%) |
MaxMinCluster + MSS | 91% (27%) | 79% (82%) | 51% (53%) |
|