Vsearch Manual

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 44

Scroll down to view the document on your mobile browser.
vsearch(1) USER COMMANDS vsearch(1)NAME vsearch — chimera detection, clustering, dereplication and rereplication, FASTA/FASTQ file processing,masking, pairwise alignment, searching, shuffling, sorting, subsampling, and taxonomic classification ofamplicons for metagenomics, genomics, and population genetics.SYNOPSISChimera detection:vsearch (--uchime_denovo | --uchime2_denovo | --uchime3_denovo)fastafile (--chimeras |--nonchimeras | --uchimealns | --uchimeout) outputfile [options]vsearch --uchime_ref fastafile (--chimeras | --nonchimeras | --uchimealns | --uchimeout) outputfile--db fastafile [options]Clustering:vsearch (--cluster_fast | --cluster_size | --cluster_smallmem | --cluster_unoise) fastafile (--alnout |--biomout | --blast6out | --centroids | --clusters | --mothur_shared_out | --msaout | --otutabout |--profile | --samout | --uc | --userout) outputfile --id real [options]Dereplication and rereplication:vsearch (--derep_fulllength | --derep_prefix) fastafile (--output | --uc) outputfile [options]vsearch --rereplicate fastafile --output outputfile [options]FASTA/FASTQ file processing:vsearch --fastq_chars fastqfile [options]vsearch --fastq_convert fastqfile --fastqout outputfile [options]vsearch (--fastq_eestats | --fastq_eestats2) fastqfile --output outputfile [options]vsearch --fastq_filter fastqfile (--fastaout | --fastaout_discarded | --fastqout | --fastqout_discarded)outputfile [options]vsearch --fastq_join fastqfile --reverse fastqfile (--fastaout | --fastqout) outputfile [options]vsearch --fastq_mergepairs fastqfile --reverse fastqfile (--fastaout | --fastqout | --fastaout_not-merged_fwd | --fastaout_notmerged_rev|--fastqout_notmerged_fwd | --fastqout_notmerged_rev|--eetabbedout) outputfile [options]vsearch --fastq_stats fastqfile [--log logfile][options]vsearch --fastx_revcomp fastxfile (--fastaout | --fastqout) outputfile [options]vsearch --sff_convert sff-file --fastqout outputfile [options]Masking:vsearch --fastx_mask fastxfile (--fastaout | --fastqout) outputfile [options]vsearch --maskfasta fastafile --output outputfile [options]Pairwise alignment:vsearch --allpairs_global fastafile (--alnout | --blast6out | --matched | --notmatched | --samout |--uc | --userout) outputfile (--acceptall | --id real)[options]Searching:vsearch --search_exact fastafile --db fastafile (--alnout | --biomout | --blast6out |--mothur_shared_out | --otutabout | --samout | --uc | --userout) outputfile [options]vsearch --usearch_global fastafile --db fastafile (--alnout | --biomout | --blast6out |--mothur_shared_out | --otutabout | --samout | --uc | --userout) outputfile --id real [options]Shuffling and sorting:vsearch (--shuffle | --sortbylength | --sortbysize) fastafile --output outputfile [options]Subsampling:vsearch --fastx_subsample fastafile (--fastaout | --fastqout) outputfile (--sample_pct real |--sam-ple_size positive integer)[options]version 2.10.4 January 4, 2019 1
vsearch(1) USER COMMANDS vsearch(1)Taxonomic classification:vsearch --sintax fastafile --db fastafile --tabbedout outputfile [--sintax_cutoff real][options]UDB database handling:vsearch --makeudb_usearch fastafile --output outputfile [options]vsearch --udb2fasta udbfile --output outputfile [options]vsearch (--udbinfo | --udbstats) udbfile [options]DESCRIPTIONEnvironmental or clinical molecular diversity studies generate large volumes of amplicons (e.g.; SSU-rRNAsequences) that need to be checked for chimeras, dereplicated, masked, sorted, searched, clustered orcompared to reference sequences. The aim of vsearch is to offer a all-in-one open source tool to performthese tasks, using optimized algorithm implementations and harvesting the full potential of modern com-puters, thus providing fast and accurate data processing.Comparing nucleotide sequences is at the core of vsearch.Tospeed up comparisons, vsearch implementsan extremely fast Needleman-Wunsch algorithm, making use of the Streaming SIMD Extensions (SSE2) ofpost-2003 x86-64 CPUs. If SSE2 instructions are not available, vsearch exits with an error message. OnPower8 CPUs it will use AltiVec/VSX/VMX instructions. Memory usage increases rapidly with sequencelength: for example comparing twosequences of length 1 kb requires 8 MB of memory per thread, andcomparing two10kbsequences requires 800 MB of memory per thread. For comparisons involvingsequences with a length product greater than 25 million (for example twosequences of length 5 kb),vsearch uses a slower alignment method described by Hirschberg(1975) and Myers and Miller (1988),with much smaller memory requirements.Inputvsearch accept as input fasta or fastq files containing one or several nucleotidic entries. In fasta files, eachnucleotidic entry is made of a header and a sequence. The header is defined as the string comprisedbetween the ’>’ symbol and the first space, tab or the end of the line, whichevercomes first. Additionally,ifthe header matches integeras the number of occurrences (or abundance) of the sequence in the study.Thatabundance information is used or created during chimera detection, clustering, dereplication, sorting andsearching.The sequence is defined as a string of IUPAC symbols (ACGTURYSWKMDBHVN), starting after the endof the identifier line and ending before the next identifier line, or the file end. vsearch silently ignores asciicharacters 9 to 13, and exits with an error message if ascii characters 0 to 8, 14 to 31, ’.’or’-’ are present.All other ascii or non-ascii characters are stripped and complained about in a warning message.In fastq files, each entry is made of sequence header starting with a symbol ’@’, a nucleotidic sequence(same rules as for fasta sequences), a quality header starting with a symbol ’+’ and a string of ASCII char-acters (offset 33 or 64), each one encoding the quality value of the corresponding position in the nucleotidicsequence.vsearch operations are case insensitive,except when soft masking is activated. Masking is automaticallyapplied during chimera detection, clustering, masking, pairwise alignment and searching. Soft masking isspecified with the options ’--dbmask soft’ (for searching and chimera detection with a reference) or’--qmask soft’ (for searching, de novo chimera detection, clustering and masking). When using soft mask-ing, lower case letters indicate masked symbols, while upper case letters indicate regular symbols. Maskedsymbols are neverincluded in the unique indexwords used for sequence comparisons, otherwise theyaretreated as normal symbols.When comparing sequences during chimera detection, dereplication, searching and clustering, T and U areconsidered identical, regardless of their case. If twosymbols are not identical, their alignment result in anegative mismatch score (default -4), except if one or both of the symbols are ambiguous (RYSWKMDB-HVN) in which case the score is zero. Alignment of twoidentical ambiguous symbols (for example, R vsR) also receivesascore of zero.vsearch can read data from standard files and write to standard files, but it can also read from pipes andwrite to pipes! For example, multiple fasta files can be piped into vsearch for dereplication. Todoso, fileversion 2.10.4 January 4, 2019 2

Navigation menu