COSMOS can detect somatic structural variations from whole genome short-read sequences. Also, it can be applicable to de novo SV detection in a family trio. This is collaborative work with Takeda Laboratory, Osaka University.
- Python (>= 2.7) and Pysam (>= 0.7.4)
- Paired-end reads from normal and tumor samples. We tested on Illumina paired-end libraries with -500 bp insert length with SD -30 bp
- Sorted BAM files (normal.bam, tumor.bam and their indices (.bam.bai)) containing mapped results of the reads on your reference genome. Any mapping software, such as BWA and Bowtie, can be used.
- Install Pysam (e.g. "pip install pysam")
- Type following commands
%wget wget http://seselab.org/cosmos/cosmos.zip # Download COSMOS scripts %unzip cosmos.zip %cd cosmos %wget https://sesejunstrg.blob.core.windows.net/public/cosmos_example.zip # Download sample dataset %unzip cosmos_example.zip %bash cosmos.sh cosmos_example/tumor.bam cosmos_example/normal.bam
%bash cosmos.sh proband.bam mother.bam father.bam
How to Check the ResultEach line in "sv_list.tsv" shows one SV detected by COSMOS. Columns are
- id: ID for the detected SV.
- chrn1: Chromosome ID of the left breakpoint.
- pos1: Position of the left breakpoint.
- rev1: Direction of the left breakpoint. 0 is left, and 1 is right.
- chrn2: Chromosome ID of the right breakpoint.
- pos2: Position of the right breakpoint.
- rev2: Direction of the right breakpoint. 0 is right, and 1 is left.
- indel: Length of the SV.
- size: Number of reads in this discordant cluster.
- type: DEL(deletion), INV(inversion), TRA(translocation), DUP(duplication)
- mapq: Average mapping quality of this discordant group.
- gap1: Estimated error in the left breakpoint
- gap2: Estimated error in the right breakpoint
- score: Confidence score of this SV
- filter: When it has enough high confidence score, "not_filtered"
- depth: Debug information
id chrn1 pos1 rev1 chrn2 pos2 rev2 indel size type mapq gap1 gap2 score filter 10340 chr22 33755427 0 = 33757619 1 2150 34 DEL 37 16 0 0.038016294 not_filtered [[[81, 117, 34...means "a deletion between 33,755,427 and 33,757,619 on chr22" whose length is 2,150 bp is detected".
Detail UsageWhen you have one "turmor" (tumor.bam) and two "normal" samples (normal1.bam and normal2.bam).
%python pre_cosmos.py -b tumor.bam -q 20 -o pre_0.tmp %python pre_cosmos.py -b normal1.bam -q 20 -o pre_1.tmp %python pre_cosmos.py -b normal2.bam -q 20 -o pre_2.tmp"-q 20" means that reads with 20 or larger mapq are used.
%python stat.py -p pre_0.tmp -q 20 -i 0 %python stat.py -p pre_1.tmp -q 20 -i 1 %python stat.py -p pre_2.tmp -q 20 -i 2"-i 0" means tumor, and "-i n" is n-th normal sample (n>0). In this example, two normal samples exists, and hence 1 and 2 are used.
Make discordant clusters
%python make_cluster.py -b tumor.bam -q 0 -i 0 -s 3Make clusters of discordant read pairs in tumor.bam. "-s 3" means minimum cluster size is 3 [Default value:3]
Remove false-positive discordant clusters
%python ref_control.py -b normal1.bam -q 0 -i 1 %python ref_control.py -b normal2.bam -q 0 -i 2Remove discordant clusters close to reads in normal samples.
Count strand-specific read depth
%python count_term.py -b tumor.bam -q 0 -i 0 %python count_term.py -b normal1.bam -q 0 -i 1 %python count_term.py -b normal2.bam -q 0 -i 2
Filter out untrustable breakpoints with the statistics
%python form.py -b tumor.bam -q 2 -s 3Final outputs are in "sv_list.tsv"
- COSMOS: accurate detection of complex somatic structural variations through asymmetric comparison between tumor and normal samples. Nucleic Acids Res. Vol. 44, Issue 8. e78, 2016
- Script for synthetic structural variation generation: cosmos_sim.zip. Details are described in "usage.txt" in the archive.
firstname.lastname@example.org and email@example.com