VarBen Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 6
Download | |
Open PDF In Browser | View PDF |
VarBen Manual Introduction VarBen is a software to add SNV/Indel, CNV and SV to BAM files, used for testing mutation callers and pipelines. Software Dependencies 1. samtools (http://samtools.sourceforge.net/) 2. pysam (http://code.google.com/p/pysam/ or pip install pysam) 3. bwa/tmap/novoalign Known bugs and limitation VarBen is under rapid development driven by suggesting and bug reports from the mutation calling community. 1. Currently, we are working on testing a new version of mutation editor for Ion Torrent plantform. 2. There is a bug in parallel function, the user cannot stop software via ctrl-C during it is running in parallel mode. Function 1. Mutation editor (muteditor.py) usage: muteditor.py [-h] -m MUTFILE -b BAMFILE -r REFFASTA -o OUTDIR --alignerIndex ALIGNERINDEX [-p PROCESS] [--seqer SEQER] [-g] [--aligner ALIGNER] [--haplosize HAPLOSIZE] [--mindepth MINDEPTH] [--minmutreads MINMUTREADS] [--snpfrac SNPFRAC] [--minmapq MINMAPQ] [--multmapfilter] [--diffcover DIFFCOVER] [--floworder FLOWORDER] [--libkey LIBKEY] [--barcode BARCODE] [--tag] Edit bamfile to spike in SNV, Indel, Complex, Substitution Function 2. SV/CNV editor (sveditor.py) usage: sveditor.py [-h] -m SVFILE -b BAMFILE -r REFFASTA -l READLENGTH -o OUTDIR --alignerIndex ALIGNERINDEX [-p PROCESS] [--seqer SEQER] [-g] [--aligner ALIGNER] [--mindepth MINDEPTH] [--minmutreads MINMUTREADS] [--minmapq MINMAPQ] [--multmapfilter] [--floworder FLOWORDER] [--libkey LIBKEY] [--barcode BARCODE] [--tag] Edit bam file to spike in SV 1 Local optional arguments (only used in muteditor.py) -m MUTFILE, --mutfile MUTFILE Target regions to try and spike in a point mutation. There are four types of snv/indel included in the software: snv, ins(insertion), del(deletion), Sub(Complex mutation). The file format is shown as below. #Chrom Start End AlleleFrequency Type AlternativeSequence chr1 899778 899778 0.9 snv T chr1 3712508 3712508 0.9 snv T chr1 1158637 1158638 0.9 ins TAG chr1 3397038 3397039 0.9 ins AGGTAG chr1 6533124 6533126 0.9 del . chr1 7910946 7910956 0.9 del . chr7 55242467 55242481 0.3 Sub TTC ### Complex indel format: EGFR, c.2237_2251>TTC(p.E746_T751>VP) How to determine the start and end position? • For single nucleotide variant, the start and end site should be the same position in the genome. • For short sequence insertion, the start and end site should has 1 base difference. • For short sequence deletion, the start and end site shows the sequence start and end position which will exclude from the sequencing reads. • For Complex mutation, we want to using a short sequence A to instead of a sequence B, we should put the sequence B’s start and end position in our mutation file. Local optional arguments (only used in sveditor.py) -m SVFILE, --svfile SVFILE Target regions to try and spike in a SV or CNV. There are six types of SV included in the software: inv(inversion), del(deletion), dup(duplication), trans_chrom (whole arm translocate chromosome), trans_balance(balanced translocation chromosome), trans_unbalance (insertional translocation chromosome). del & inv format #chrom start end type AF chrX 12994966 12996009 del 0.6 chrX 20172336 20176010 del 0.6 2 chrX 105121310 105134706 del 0.6 chrX 108614726 108616334 del 0.6 chrX 13703890 14134046 inv 0.6 chrX 19975999 20064786 inv 0.6 chrX 32391049 32794255 inv 0.6 chrX 40994338 41012689 inv 0.6 dup format #chrom start end type AF dup_num chr1 15808448 15814030 dup 0.6 3 chr1 16076907 16086182 dup 0.6 4 chr1 23665443 23711586 dup 0.6 3 chr1 28057278 28081157 dup 0.6 3 trans_chrom & trans_balance & trans_unbalance format #CHR1 CHR1_start CHR1_end type AF CHR2 CHR2_start chr10 7059511 7059511 trans_chrom 0.5 chr19 17396810 chr19 17327977 17327977 trans_chrom 0.5 chr3 CHR2_end 17396810 186528041 186528041 chr3 107598967 107598967 trans_chrom 0.5 chr7 38371959 38371959 chr1 31561816 31561816 trans_chrom 0.5 chr6 41297838 41297838 chr2 29754284 29754947 trans_balance 0.5 chr2 chr10 43608984 43609308 trans_unbalance 0.5 chr6 42522695 42523089 117640981 117640982 There are two types of CNV included in the software: gainand loss. The file format is shown as below. #chrom start end type AF cnv_type chrX 66764255 66950650 cnv 2.5 gain chr20 52186265 52200826 cnv 2 loss -l READLENGTH, --readlength READLENGTH The read length of BAM file. 3 Global optional arguments -h, --help To show the help message and exit. -b BAMFILE, --bamfile BAMFILE A BAM file to spike in mutations, the bam file should be sorted and indexed, the user also needs to provide a BAM index .bai file with the same prefix name as BAM file. By default, the software considers the BAM file consists by entirely paired-end reads, if a user needs to spike mutation in a BAM file which consists by single-end reads, they need using the -single option. -r REFFASTA, --reffasta REFFASTA Genome reference, FASTA file with corresponding index .fai file which is generated by Samtools. The target BAM file should be generated by the same reference file used in this option, especially the chromosome names and lengths in the reference FASTA must be the same as in the BAM header. This FASTA file is used to create pseudo reads near the editing poisition. -o OUTDIR, --outdir OUTDIR A output directory name for edited bam file and other information. --alignerIndex ALIGNERINDEX The index database sequences in the FASTA format of aligner. For example if the aligner is bwa, then bwa index should be provided. This FASTA file is called by the external aligner. -p PROCESS, --process PROCESS Parallel mode: process number (default = 1) --seqer SEQER Define the seqer: illumina, life, BGI (default is illumina) -g, --single To declare that the input bam is single-ended (default is False) --aligner ALIGNER Choose an aligner from bwa, novoalign and tmap (default is bwa). 4 --haplosize HAPLOSIZE The size of haplotype block to consider when adding more than 1 proximal mutation. (default = 0) For example, if two SNVs are spiked in 5bp apart and -haplosize is 5 or greater, the two SNVs will be on the same haplotype (i.e. share the same reads for reads covering both positions) --mindepth MINDEPTH The minimum depth of reads position which could be used as a spike in site. (default = 30) For instance, if one spike in position reads depth is 25X (there are only 25 reads covered this position), the VarBen software will drop this spike in position automatically due to the reads depth is not enough to add in any mutation. --minmutreads MINMUTREADS The minimum number of reads to be edited in one position (default = 5). VarBen will calculate the number of mutated reads by the allele frequency and the total number of reads in the position. If the mutation reads number is less than 5, the software will not add in any mutation in this position. --snpfrac SNPFRAC To avoid spike any mutatoin on top of exisiting heterozygous alleles, the heterozygous allele fraction set to 0.2 (default = 1) --minmapq MINMAPQ A read mapping quality less than MINMAPQ will not be considered to edit (default 20). --multmapfilter Any multi-mapped reads will not be considered to edit (default is True). --diffcover DIFFCOVER The coverage difference allowed between the input BAM and output BAM (default 0.9). --floworder FLOWORDER If seqer is life, a flower order of life sequence should be provided. --libkey LIBKEY If seqer is life, a libkey of life sequence should be provided. 5 --barcode BARCODE If seqer is life, a barcode of life sequence should be provided. --tag Add tag to edited reads (default False). 6
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 6 Page Mode : UseOutlines Author : Title : VarBen Manual Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.14 Create Date : 2019:06:04 15:33:57+08:00 Modify Date : 2019:06:04 15:33:57+08:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013) kpathsea version 6.1.1EXIF Metadata provided by EXIF.tools