1. Quick start
If your machine is with nVidia CUDA SDK 4.2 installed, please visit Tutorial for a package with example data and shells showing you how to use SOAP3-dp. If your machine is with CUDA SDK 3.2 or 5.0 installed, please visit SourceForge to download the soap3-dp binaries matching your SDK version and substitute the original binaires "soap3-dp" in the tutorial with the downloaded version.
For Amazon EC2 users:
Amazon EC2 service currently has GPU instances (cg1.4xlarge) deployed in N. Virginia and Ireland. Every instance has two nVidia Tesla M2050 GPUs with 3GB graphic memory installed. To utilize both GPUs, users can invocate two soap3-dp process with parameter "-c 0" and "-c 1" respectively, and by setting "ShareIndex=1" option in "soap3-dp.ini" configuration file to share host memory between two processes if using the same copy of index. In order to facilitate Amazon users to deploy soap3-dp faster, please mount Public Snapshot ID: "snap-79ec4d78" in N. Virginia or "snap-3a712f25" in Ireland as a disk. The snapshot includes all necessary binaries and a tutorial to show you how to run SOAP3-dp in batch mode and "per Lane" mode.
The SOAP3-dp binary was compiled using AMI: amzn-ami-gpu-hvm-2012.09.0.x86_64-ebs (ami-02f54a6b). The AMI provides nVidia CUDA 4.2.
Snapshot for Amazon EC2 configuration:
Mounting snapshot as a device (Lateset Snapshot ID: snap-79ec4d78):
SOAP3-dp, like its predecessor SOAP3, is a GPU-based software for aligning short reads to a reference sequence. It improves SOAP3 in terms of both speed and sensitivity by skillful exploitation of whole-genome indexing and dynamic programming on a GPU. SOAP3 is limited to find alignments with at most 4 mismatches, while SOAP3-dp can find alignments involving mismatches, INDELs, and small gaps. The number of reads aligned, especially for paired-end data, typically increases 5 to 10 percent from SOAP3 to SOAP3-dp. More interestingly, SOAP3-dp's alignment time is much shorter than SOAP3, as it is found that GPU-based dynamic programming when coupled with indexing can be much more efficient. For example, when aligning length-100 single-end reads with the human genome, SOAP3 typically requires tens of seconds per million reads, while SOAP3-dp takes only a few seconds.
The alignment program in this package is optimized to process data sets with multi-millions of short reads by using a multi-core CPU and a GPU concurrently. The hardware requirement and usage of SOAP3-dp is similar to that of SOAP3 (see next section for details). Roughly speaking, SOAP3-dp first aligns reads using SOAP3 with a small number of mismatches only; unaligned reads are further aligned using index-assisted dynamic programming (semi-global alignment with affine gap penalty). The default setting finds alignments with similarity down to 75%. Users can control the alignment similarity via five dynamic programming parameters in the .INI file (which correspond to the scores of a match, mismatch, gap opening and gap extension and the cutoff threshold; default: 1, -2, -3, -1, and 30 for read length 100). SOAP3-dp has an option to disable the dynamic programming, which will make SOAP3-dp to function exactly the same as SOAP3 (i.e., aligning with mismatches only).
SOAP3-dp version 2.3.172:
- Stable version release.
- 1. Now soft-clip the alignments that hangs off the end of its reference sequence.
- 2. Now by default do not output MD and NM tags. Use "-p" parameter to enable the output of MD and NM tags, or use "samtools calmd" to append the MD and NM tags.
- 3. Minor bugs fixed.
SOAP3-dp version 2.3.169:
- Stable version release.
- Several bugs fixed.
SOAP3-dp version 2.3 has the following new features:
- Several bugs fixed.
- A new option for user to output the BWA-like MAPQ score (inside the soap3-dp.ini file), which is enabled by default. The scale and the range of the scores are similar to the scores reported by BWA. This would be useful if one would like to further process the alignment results by using the software like GATK, which was tuned according to the MAPQ scores reported by BWA.
SOAP3-dp version 2.2 has the following new features:
- Improved performance when aligning longer Illumina reads (i.e. length between 150 and 300).
- Increased sensitivity by adding one more step to align each end of the unmapped paired-end reads separately.
SOAP3-dp version 2.1 has the following new features:
- SOAP3-dp can now support at most 65,000 reference sequences (chromosomes).
SOAP3-dp version 2.0 has the following new features:
- A more accurate mapping quality score. The mapping quality score indicates how reliable the resulting alignment is. (More information about mapping quality score is mentioned in the section 2.3 of the manual, and one may check the accuracy of SOAP3-dp from the chart in the following section).
- The alignment results can be outputted in BAM format.
- Allow to specify which GPU card for running SOAP3-dp (useful when there exists more than one GPU card in a machine).
- Allow to share the index in the memory (CPU side) among all the running instances of SOAP3-dp.
- The input format for multiple sets of reads have been updated. The user can have a greater flexibility, like each set of reads can have different range of insert size and different location for outputting the alignment results.
The algorithms & software in this package were developed by the algorithms research group of the University of Hong Kong (T.W. Lam, L.K. Lee, C.M. Liu, Ruibang Luo, H.F. Ting, Thomas Wong, Edward Wu, S.M. Yiu & Jianqiao Zhu), in collaboration on data with BGI (Yingrui Li, Chang Yu and Bingqiang Wang) and Peking University (Ruiqiang Li)
Please contact Ruibang Luo (rbluo at cs dot hku dot hk) for problems and suggestions.
Copyright © 2013, Department of Computer Science, The University of Hong Kong
SOAP3-dp is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
SOAP3-dp is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License V3 along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
Remark: SAM-tools v0.1.18 is included in SOAP3-dp package to facilitate outputting alignment result into SAM output format. We have slightly modified the original code of SAM-tools to make it compilable under g++. Please see http://samtools.sourceforge.net/ for details of this package.