Bam Utils Manual
User Manual: Pdf
Open the PDF directly: View PDF .
Page Count: 14
Download | |
Open PDF In Browser | View PDF |
Package ‘bamUtils’ January 23, 2018 Title Utility functions for manipulating bams Version 0.0.0.9000 Description Utility functions for manipulating BAMs biocViews Depends R (>= 3.1), data.table (>= 1.9), GenomicRanges, GenomicAlignments, Rsamtools (>= 1.18), gUtils Suggests testthat License GPL-2 BugReports http://github.com/mskilab/bamUtils/issues LazyData true RoxygenNote 6.0.1.9000 R topics documented: bam.cov.gr . bam.cov.tile bamflag . . bamtag . . . chunk . . . count.clips . countCigar . get.mate.gr get.pairs.grl hets . . . . is.paired.end mafcount . oneoffs . . read.bam . . splice.cigar varbase . . varcount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 4 4 5 5 6 6 7 7 8 8 9 10 11 12 12 14 1 2 bam.cov.gr bam.cov.gr Get coverage as GRanges from BAM on custom set of GRanges Description Gets coverage from BAM in supplied GRanges using ’countBam()’, returning GRanges with coverage counts in each of the provided GRanges (different from ’bamUtils::bam.cov()’) specified as the columns $file, $records, and $nucleotides in the values field Basically a wrapper for ’Rsamtools::countBam()’ with some standard settings for ’Rsamtools::ScanBamParams()’ Usage bam.cov.gr(bam, bai = NULL, intervals = NULL, all = FALSE, count.all = FALSE, isPaired = TRUE, isProperPair = TRUE, isUnmappedQuery = FALSE, hasUnmappedMate = FALSE, isNotPassingQualityControls = FALSE, isDuplicate = FALSE, mc.cores = 1, chunksize = 10, verbose = FALSE, ...) Arguments bam string Input BAM file. Advisable to make the input BAM a BamFile instance instead of a plain string, so that the index does not have to be reloaded. bai string Input BAM index file intervals GRanges of intervals to retrieve all boolean Flag to read in all of BAM as a GRanges via ‘si2gr(seqinfo())‘ (default = FALSE) isPaired boolean Flag indicates whether unpaired (FALSE), paired (TRUE), or any (NA) read should be returned. See documentation for Rsamtools::scanBamFlag(). (default == NA) isProperPair boolean Flag indicates whether improperly paired (FALSE), properly paired (TRUE), or any (NA) read should be returned. A properly paired read is defined by the alignment algorithm and might, e.g., represent reads aligning to identical reference sequences and with a specified distance. See documentation for Rsamtools::scanBamFlag(). (default == NA) isUnmappedQuery boolean Flag indicates whether unmapped (TRUE), mapped (FALSE), or any (NA) read should be returned. See documentation for Rsamtools::scanBamFlag(). (default == NA) hasUnmappedMate boolean Flag indicates whether reads with mapped (FALSE), unmapped (TRUE), or any (NA) mate should be returned. See documentation for Rsamtools::scanBamFlag(). (default == NA) isNotPassingQualityControls boolean Flag indicates whether reads passing quality controls (FALSE), reads not passing quality controls (TRUE), or any (NA) read should be returned. See documentation for Rsamtools::scanBamFlag(). (default == NA) bam.cov.tile isDuplicate mc.cores chunksize verbose ... 3 boolean Flag indicates that un-duplicated (FALSE), duplicated (TRUE), or any (NA) reads should be returned. ’Duplicated’ reads may represent PCR or optical duplicates. See documentation for Rsamtools::scanBamFlag(). (default == FALSE) integer Number of cores in mclapply call integer How many intervals to process per core (default == 10) boolean "verbose" flag (default == FALSE) futher arguments passed into Rsamtools::scanBamFlag() Value GRanges parallel to input GRanges, but with metadata filled in. bam.cov.tile Get coverage as GRanges from BAM on genome tiles across seqlengths of genome Description Quick way to get tiled coverage via piping to samtools (e.g. ~10 CPU-hours for 100bp tiles, 5e8 read pairs) Gets coverage for window size "window" [bp], pulling "chunksize" records at a time and incrementing bin corresponding to midpoint or overlaps of corresponding (proper pair) fragment (uses TLEN and POS for positive strand reads that are part of a proper pair) Usage bam.cov.tile(bam.file, window = 100, chunksize = 1e+05, min.mapq = 30, verbose = TRUE, max.tlen = 10000, st.flag = "-f 0x02 -F 0x10", fragments = TRUE, region = NULL, do.gc = FALSE, midpoint = TRUE) Arguments bam.file window chunksize min.mapq verbose max.tlen st.flag fragments region do.gc midpoint string Input BAM file integer Window size (in bp) (default == 1e2) integer Size of window (default == 1e5) integer Minimim map quality reads to consider for counts (default == 30) boolean "verbose" flag (default == TRUE) max paired-read insert size to consider boolean Samtools flag to filter reads on (default == ’-f 0x02 -F 0x10’) boolean Flag (default == FALSE) um? (default == NULL) boolean Flag to execute garbage collection via ’gc()’ (default == FALSE) boolean Flag if TRUE will only use the fragment midpoint, if FALSE will count all bins that overlap the fragment Value GRanges of "window" bp tiles across seqlengths of bam.file with meta data field $counts specifying fragment counts centered in the given bin. 4 bamtag Returns matrix of bits from BAM flags bamflag Description Shortcut function: assumes reads are GappedAlignments with flag variable or actual integers representing BAM flag Usage bamflag(reads) Arguments reads GenomicRanges or ’GappedAlignments’ or data.table holding the reads Value matrix of bits from BAM flags bamtag Outputs a tag to identify duplicate reads in GRanges input Description Outputs a tag that cats ’qname’, first vs first second mate +/- secondary alignment +/- gr.string to give an identifier for determine duplicates in a read pile Usage bamtag(reads, secondary = FALSE, gr.string = FALSE) Arguments reads GenomicRanges or GappedAlignments or data.frame holding the reads secondary boolean including secondary alignment(s) (default == FALSE) gr.string boolean input reads into gr.string() (default == FALSE) chunk chunk 5 chunk Description Internal function takes same input as seq (from, to, by, length.out) and outputs a 2 column matrix of indices corresponding to "chunks" Usage chunk(from, to = NULL, by = 1, length.out = NULL) Arguments from integer where to begin sequence to integer to end sequence by interval to space sequence length.out number of desired chunks, i.e. nrows of output matrix Value 2 column matrix of indices, each row representing chunk Author(s) Marcin Imielinski count.clips Return data.frame with fields of "right" soft clips and "left" soft clips Description Takes GRanges or GappedAlignments object and uses cigar field (or takes character vector of cigar strings) and returns data.frame with fields (for character input) $right.clips number of "right" soft clips (e.g. cigar 89M12S) $left.clips number of "left" soft clips (e.g. cigar 12S89M), or appends these fields to the reads object Usage count.clips(reads) Arguments reads GenomicRanges or GappedAlignments or data.frame or data.table holding the reads Value GRanges with ’right.clips’ and ’left.clips’ columns added 6 get.mate.gr Count bases in cigar string countCigar Description Counts the total number of bases, per cigar, that fall into D, I, M, S categories. countCigar makes no distinction between, for instance 1S2M2S, 2S2M1S, or 3S2M Usage countCigar(cigar) Arguments cigar character vector of cigar strings Value matrix of dimensions (4-column, length(cigar)) with the total counts for each type get.mate.gr returns GRanges corresponding to mates of reads Description Inputs GRanges or data.frame/data.table of reads. Outputs GRanges corresponding to mates of reads. Usage get.mate.gr(reads) Arguments reads GRanges or data.table/data.frame Input reads Value GRanges corresponding to mates of reads get.pairs.grl get.pairs.grl 7 Get coverage as GRanges from BAM on custom set of GRanges Description Takes reads object and returns GRangesList with each read and its mate (if exists) Usage get.pairs.grl(reads, pairs.grl.split = TRUE, verbose = FALSE) Arguments reads GRanges holding reads pairs.grl.split boolean returns as GRangesList if TRUE (default == TRUE) verbose hets boolean verbose flag (default == FALSE) Simple het "caller" meant to be used at validated het SNP sites for tumor / normal pairs Description hets dumps a tsv file of alt ($alt.count.t, $alt.count.n) and ref ($ref.count.t, $ref.count.n) read counts to out.file for a tumor / normal pair across a set of sites specified by an input VCF Usage hets(tum.bam, norm.bam = NULL, out.file, vcf.file, chunk.size1 = 1000, chunk.size2 = 100, mc.cores = 14, verbose = T, na.rm = TRUE, filt.norm = T) Arguments tum.bam character scalar or BamFile of tumor bam norm.bam character scalar or BamFile of normal bam (optional) out.file path to TSV output file to be generated vcf.file VCF file of sites (eg hapmap or 1000G) at which to compute read counts chunk.size1 number of variants to process from VCF file at a time chunk.size2 number of variants to access from BAM file in a single iteration mc.cores how many cores to parallelize verbose verbose logical flag na.rm logical flag to remove rows with NA counts filt.norm logical flag remove any sites that have allele fraction of 0 or 1 or NA in MAF 8 mafcount Value nil Author(s) Marcin Imielinski is.paired.end Check if BAM file is paired end by using 0x1 flag Description Check if BAM file is paired-end by using 0x1 flag, pipes to ’samtools’ via command line Create GRanges of read mates from reads Usage is.paired.end(bams) get.mate.gr(reads) Arguments bams vector of input BAMs Value boolean returns TRUE if BAM file is paired-end, returns FALSE if BAM not paired-end GRanges corresponding to mates of reads mafcount Wrapper around varcount adapted to tumor and normal "paired" bams Description mafcount Returns base counts for reference and alternative allele for an input tum and norm bam and maf data frame or GRAnges specifying substitutions maf is a single width GRanges describing variants and field ’ref’ (or ’Reference_Allele’), ’alt’ (or ’Tum_Seq_Allele1’) specifying reference and alt allele. maf is assumed to have width 1 and strand is ignored. Usage mafcount(tum.bam, norm.bam = NULL, maf, chunk.size = 100, verbose = T, mc.cores = 1, ...) oneoffs 9 Arguments tum.bam character scalar or BamFile specifying path to tumor sample norm.bam optional character scalar or BamFile specifying path to normal sample maf GRanges or data.frame or data.table of imported maf (i.e. output of read.delim or fread) chunk.size integer number of variants to extract from bam file at each iteration verbose logical flag whether to print verbose output mc.cores number of cores to parallelize ... additional params to pass to varcount Value GRanges of maf annotated with fields $alt.count.t, $ref.count.t, $alt.count.n, $ref.count.n Author(s) Marcin Imielinski oneoffs Calls samtools mpileup to dump tsv of "one off" variants / sites (i.e. that are present in exactly one read per site) Description Calls samtools mpileup to dump tsv of "one off" variants / sites (i.e. that are present in exactly one read per site) Usage oneoffs(out.file, bam, ref, min.bq = 30, min.mq = 60, indel = FALSE, chunksize = 10000, verbose = TRUE) Arguments out.file file to dump tsv to bam bam file path ref fasta path min.bq integer minimum base quality min.mq integer minimum mapping quality indel logical flag whether to collect one off indels (default is substitution) chunksize number of mpileup lines to put into memory verbose logical flag Note The denominator (ie total reads) is just the sum of counts$records 10 read.bam read.bam Read BAM file into GRanges or data.table Description Wrapper around Rsamtools bam scanning functions, by default, returns GRangesList of read pairs for whichread lies in the supplied interval Usage read.bam(bam, intervals = NULL, gr = intervals, all = FALSE, bai = NULL, pairs.grl = TRUE, stripstrand = TRUE, what = scanBamWhat(), unpack.flag = FALSE, verbose = FALSE, tag = NULL, isPaired = NA, isProperPair = NA, isUnmappedQuery = NA, hasUnmappedMate = NA, isNotPassingQualityControls = NA, isDuplicate = F, isValidVendorRead = TRUE, pairs.grl.split = TRUE, as.data.table = FALSE, ignore.indels = FALSE, ...) Arguments Input bam file. Advisable to make "bam" a BamFile instance instead of a plain string, so that the index does not have to be reloaded. intervals GRanges of intervals to retrieve gr GRanges of intervals to retrieve stripstrand Flag to ignore strand information on the query intervals. Default TRUE what What fields to pull down from BAM. Default scanBamWhat() unpack.flag Add features corresponding to read flags. Default FALSE verbose Increase verbosity tag Additional tags to pull down from the BAM (e.g. ’R2’) isPaired See documentation for scanBamFlag. Default NA isProperPair See documentation for scanBamFlag. Default NA isUnmappedQuery See documentation for scanBamFlag. Default NA hasUnmappedMate See documentation for scanBamFlag. Default NA isNotPassingQualityControls See documentation for scanBamFlag. Default NA isDuplicate See documentation for scanBamFlag. Default FALSE isValidVendorRead See documentation for scanBamFlag. Default TRUE as.data.table Return reads in the form of a data.table rather than GRanges/GRangesList ignore.indels messes with cigar to read BAM with indels removed. Useful for breakpoint mapping on contigs ... passed to scanBamFlag bami Input bam index file. as.grl Return reads as GRangesList. Controls whether get.pairs.grl does split. Default TRUE bam splice.cigar 11 Value Reads in one of GRanges, GRangesList or data.table splice.cigar Get coverage as GRanges from BAM on custom set of GRanges Description Takes GRanges or GappedAlignments object "reads" and parses cigar fields to return GRanges or GRangesList corresponding to spliced alignments on the genome, which correspond to portions of the cigar i.e. each outputted GRanges/GRangesList element contains the granges corresponding to all non-N portions of cigar string If GRangesList provided as input (e.g. paired reads) then all of the spliced ranges resulting from each input GRangesList element will be put into the corresponding output GRangesList element NOTE: does not update MD tag If use.D = TRUE, then will treat "D" flags (deletion) in addition to "N" flags as indicative of deletion event. Usage splice.cigar(reads, verbose = TRUE, fast = TRUE, use.D = TRUE, rem.soft = TRUE, get.seq = FALSE, return.grl = TRUE) Arguments reads GenomicRanges or GappedAlignments or data.frame input reads verbose boolean verbose flag (default == TRUE) fast boolean Flag to use ’GenomicAlignments::cigarRangesAlongReferenceSpace()’ to translate CIGAR to GRanges (default == TRUE) use.D boolean Treats "D" tags as deletions, along with "N" tags (default == TRUE) rem.soft boolean Pick up splice ’S’, soft-clipped (default == TRUE) get.seq boolean Get InDels (default == TRUE) return.grl boolean Return as GRangesList (default == TRUE) 12 varcount varbase Returns variant bases and ranges from GRanges or GappedAlignments input Description Takes GRanges or GappedAlignments object "reads" and uses cigar, MD, seq fields to return variant bases and ranges Teturns GRangesList (of same length as input) of variant base positions with character vector $varbase field populated with variant bases for each GRanges item in grl[[k]], with the following handling for insertions, deletions, and substitution GRange’s: Substitutions: nchar(gr$varbase) = width(gr) of the corresponding var Insertions: nchar(gr$varbase)>=1, width(gr) ==0 Deletions: gr$varbase = ”, width(gr)>=1 Each GRanges also has $type flag which shows the cigar string code for the event i.e. S = soft clip –> varbase represents clipped bases I = insertion –> varbase represents inserted bases D = deletion –> varbase is empty X = mismatch –> varbase represents mismatched bases Usage varbase(reads, soft = TRUE, verbose = TRUE) Arguments reads GenomicRanges or GRangesList or GappedAlignments or data.frame/data.table reads to extract variants from soft boolean Flag to include soft-clipped matches (default == TRUE) verbose boolean verbose flag (default == TRUE) varcount Wrapper around applyPileups Description takes in vector of bam paths, GRanges corresponding to sites / territories to query, and outputs a list with fields $counts = 3D matrix of base counts (A, C, G, T, N) x sites x bams subject to mapq and baseq thresholds #’ (uses varbase) ... = other args go to read.bam Usage varcount(bams, gr, min.mapq = 0, min.baseq = 20, max.depth = 500, indel = F, ...) varcount 13 Arguments bams character vector of paths to bam files gr granges of (width 1) sites ie intervals at which to compute base coujnts min.mapq minimal mapping quality at which to compute bases max.depth max read depth to consider indel logical flag whether to consider indels (default FALSE) max.baseq minimal base qualitya t which to compute bases Value input GRanges gr annotated with fields $alt.count.t, $ref.count.t, $alt.count.n, $ref.count.n Author(s) Marcin Imielinski Index bam.cov.gr, 2 bam.cov.tile, 3 bamflag, 4 bamtag, 4 chunk, 5 count.clips, 5 countCigar, 6 get.mate.gr, 6 get.mate.gr (is.paired.end), 8 get.pairs.grl, 7 hets, 7 is.paired.end, 8 mafcount, 8 oneoffs, 9 read.bam, 10 splice.cigar, 11 varbase, 12 varcount, 12 14
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 14 Page Mode : UseOutlines Author : Title : Subject : Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.14 Create Date : 2018:01:23 14:21:23-05:00 Modify Date : 2018:01:23 14:21:23-05:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013) kpathsea version 6.1.1EXIF Metadata provided by EXIF.tools