COSMOS can detect somatic structural variations from whole genome short-read sequences. Also, it can be applicable to de novo SV detection in a family trio. This is collaborative work with Takeda Laboratory, Osaka University.

Simple Usage

  • Requirements
    • Python (>= 2.7) and Pysam (>= 0.7.4)
    • Paired-end reads from normal and tumor samples. We tested on Illumina paired-end libraries with -500 bp insert length with SD -30 bp
    • Sorted BAM files (normal.bam, tumor.bam and their indices (.bam.bai)) containing mapped results of the reads on your reference genome. Any mapping software, such as BWA and Bowtie, can be used.
  • Install Pysam (e.g. "pip install pysam")
  • Type following commands
  • %wget wget # Download COSMOS scripts
    %cd cosmos
    %wget # Download sample dataset
    %bash cosmos_example/tumor.bam cosmos_example/normal.bam
  • "sv_list.tsv" contains detected SVs.
  • If your reads in the bam are not sorted according to chromosome position, please run "samtools sort your_tumor.bam sorted_tumor" and "samtools sort your_normal.bam sorted_normal" and generate their indices before run cosmos.
  • If you do not have bam index files, please generate bai (bam index) files (e.g. "samtools index sorted_tumor.bam" and "samtools index sorted_normal.bam") before run
  • If you would like to detect SVs from trio data (proband, mother and father), the following command detects SVs only in the proband but neither in the paternal nor maternal genomes.
  • %bash proband.bam mother.bam father.bam

How to Check the Result

Each line in "sv_list.tsv" shows one SV detected by COSMOS. Columns are
  1. id: ID for the detected SV.
  2. chrn1: Chromosome ID of the left breakpoint.
  3. pos1: Position of the left breakpoint.
  4. rev1: Direction of the left breakpoint. 0 is left, and 1 is right.
  5. chrn2: Chromosome ID of the right breakpoint.
  6. pos2: Position of the right breakpoint.
  7. rev2: Direction of the right breakpoint. 0 is right, and 1 is left.
  8. indel: Length of the SV.
  9. size: Number of reads in this discordant cluster.
  10. type: DEL(deletion), INV(inversion), TRA(translocation), DUP(duplication)
  11. mapq: Average mapping quality of this discordant group.
  12. gap1: Estimated error in the left breakpoint
  13. gap2: Estimated error in the right breakpoint
  14. score: Confidence score of this SV
  15. filter: When it has enough high confidence score, "not_filtered"
  16. depth: Debug information
For example,
id	chrn1	pos1	rev1	chrn2	pos2	rev2	indel	size	type	mapq	gap1	gap2	score	filter
10340	chr22	33755427	0	=	33757619	1	2150	34	DEL	37	16	0	0.038016294	not_filtered 
[[[81, 117, 34...
means "a deletion between 33,755,427 and 33,757,619 on chr22" whose length is 2,150 bp is detected".

Detail Usage

When you have one "turmor" (tumor.bam) and two "normal" samples (normal1.bam and normal2.bam).

Quality filter

%python -b tumor.bam -q 20 -o pre_0.tmp
%python -b normal1.bam -q 20 -o pre_1.tmp
%python -b normal2.bam -q 20 -o pre_2.tmp
"-q 20" means that reads with 20 or larger mapq are used.

Calculate statistics

%python -p pre_0.tmp -q 20 -i 0
%python -p pre_1.tmp -q 20 -i 1
%python -p pre_2.tmp -q 20 -i 2
"-i 0" means tumor, and "-i n" is n-th normal sample (n>0). In this example, two normal samples exists, and hence 1 and 2 are used.

Make discordant clusters

%python -b tumor.bam -q 0 -i 0 -s 3
Make clusters of discordant read pairs in tumor.bam. "-s 3" means minimum cluster size is 3 [Default value:3]

Remove false-positive discordant clusters

%python -b normal1.bam -q 0 -i 1
%python -b normal2.bam -q 0 -i 2
Remove discordant clusters close to reads in normal samples.

Count strand-specific read depth

%python -b tumor.bam -q 0 -i 0
%python -b normal1.bam -q 0 -i 1
%python -b normal2.bam -q 0 -i 2

Filter out untrustable breakpoints with the statistics

%python -b tumor.bam -q 2 -s 3
Final outputs are in "sv_list.tsv"



  • Script for synthetic structural variation generation: Details are described in "usage.txt" in the archive.

Contacts and

© 2015 SESE Lab. Back to Top