COSMOS can detect somatic structural variations from whole genome short-read sequences. Also, it can be applicable to de novo SV detection in a family trio. This is collaborative work with Takeda Laboratory, Osaka University.

Simple Usage

  • Requirements
    • Python (>= 2.7) and Pysam (>= 0.7.4)
    • Paired-end reads from normal and tumor samples. We tested on Illumina paired-end libraries with -500 bp insert length with SD -30 bp
    • Sorted BAM files (normal.bam, tumor.bam and their indices (.bam.bai)) containing mapped results of the reads on your reference genome. Any mapping software, such as BWA and Bowtie, can be used.
  • Install Pysam (e.g. "pip install pysam")
  • Type following commands
  • %wget wget http://seselab.org/cosmos/cosmos.zip # Download COSMOS scripts
    %unzip cosmos.zip
    %cd cosmos
    %wget https://sesejunstrg.blob.core.windows.net/public/cosmos_example.zip # Download sample dataset
    %unzip cosmos_example.zip 
    %bash cosmos.sh cosmos_example/tumor.bam cosmos_example/normal.bam
    
  • "sv_list.tsv" contains detected SVs.
  • If your reads in the bam are not sorted according to chromosome position, please run "samtools sort your_tumor.bam sorted_tumor" and "samtools sort your_normal.bam sorted_normal" and generate their indices before run cosmos.
  • If you do not have bam index files, please generate bai (bam index) files (e.g. "samtools index sorted_tumor.bam" and "samtools index sorted_normal.bam") before run cosmos.sh
  • If you would like to detect SVs from trio data (proband, mother and father), the following command detects SVs only in the proband but neither in the paternal nor maternal genomes.
  • %bash cosmos.sh proband.bam mother.bam father.bam
    

How to Check the Result

Each line in "sv_list.tsv" shows one SV detected by COSMOS. Columns are
  1. id: ID for the detected SV.
  2. chrn1: Chromosome ID of the left breakpoint.
  3. pos1: Position of the left breakpoint.
  4. rev1: Direction of the left breakpoint. 0 is left, and 1 is right.
  5. chrn2: Chromosome ID of the right breakpoint.
  6. pos2: Position of the right breakpoint.
  7. rev2: Direction of the right breakpoint. 0 is right, and 1 is left.
  8. indel: Length of the SV.
  9. size: Number of reads in this discordant cluster.
  10. type: DEL(deletion), INV(inversion), TRA(translocation), DUP(duplication)
  11. mapq: Average mapping quality of this discordant group.
  12. gap1: Estimated error in the left breakpoint
  13. gap2: Estimated error in the right breakpoint
  14. score: Confidence score of this SV
  15. filter: When it has enough high confidence score, "not_filtered"
  16. depth: Debug information
For example,
id	chrn1	pos1	rev1	chrn2	pos2	rev2	indel	size	type	mapq	gap1	gap2	score	filter
10340	chr22	33755427	0	=	33757619	1	2150	34	DEL	37	16	0	0.038016294	not_filtered 
[[[81, 117, 34...
means "a deletion between 33,755,427 and 33,757,619 on chr22" whose length is 2,150 bp is detected".

Detail Usage

When you have one "turmor" (tumor.bam) and two "normal" samples (normal1.bam and normal2.bam).

Quality filter

%python pre_cosmos.py -b tumor.bam -q 20 -o pre_0.tmp
%python pre_cosmos.py -b normal1.bam -q 20 -o pre_1.tmp
%python pre_cosmos.py -b normal2.bam -q 20 -o pre_2.tmp
"-q 20" means that reads with 20 or larger mapq are used.

Calculate statistics

%python stat.py -p pre_0.tmp -q 20 -i 0
%python stat.py -p pre_1.tmp -q 20 -i 1
%python stat.py -p pre_2.tmp -q 20 -i 2
"-i 0" means tumor, and "-i n" is n-th normal sample (n>0). In this example, two normal samples exists, and hence 1 and 2 are used.

Make discordant clusters

%python make_cluster.py -b tumor.bam -q 0 -i 0 -s 3
Make clusters of discordant read pairs in tumor.bam. "-s 3" means minimum cluster size is 3 [Default value:3]

Remove false-positive discordant clusters

%python ref_control.py -b normal1.bam -q 0 -i 1
%python ref_control.py -b normal2.bam -q 0 -i 2
Remove discordant clusters close to reads in normal samples.

Count strand-specific read depth

%python count_term.py -b tumor.bam -q 0 -i 0
%python count_term.py -b normal1.bam -q 0 -i 1
%python count_term.py -b normal2.bam -q 0 -i 2

Filter out untrustable breakpoints with the statistics

%python form.py -b tumor.bam -q 2 -s 3
Final outputs are in "sv_list.tsv"

Publication

Supplement

  • Script for synthetic structural variation generation: cosmos_sim.zip. Details are described in "usage.txt" in the archive.

Contacts

yamagata.k@aist.go.jp and sese.jun@aist.go.jp

© 2015 SESE Lab. Back to Top