Vsearch Manual

User Manual:

Open the PDF directly: View PDF PDF .
Page Count: 44

Scroll down to view the document on your mobile browser.

Ad

vsearch(1) USER COMMANDS vsearch(1)NAME vsearch — chimera detection, clustering, dereplication and rereplication, FASTA/FASTQ ﬁle processing,masking, pairwise alignment, searching, shufﬂing, sorting, subsampling, and taxonomic classiﬁcation ofamplicons for metagenomics, genomics, and population genetics.SYNOPSISChimera detection:vsearch (--uchime_denovo | --uchime2_denovo | --uchime3_denovo)fastaﬁle (--chimeras |--nonchimeras | --uchimealns | --uchimeout) outputﬁle [options]vsearch --uchime_ref fastaﬁle (--chimeras | --nonchimeras | --uchimealns | --uchimeout) outputﬁle--db fastaﬁle [options]Clustering:vsearch (--cluster_fast | --cluster_size | --cluster_smallmem | --cluster_unoise) fastaﬁle (--alnout |--biomout | --blast6out | --centroids | --clusters | --mothur_shared_out | --msaout | --otutabout |--proﬁle | --samout | --uc | --userout) outputﬁle --id real [options]Dereplication and rereplication:vsearch (--derep_fulllength | --derep_preﬁx) fastaﬁle (--output | --uc) outputﬁle [options]vsearch --rereplicate fastaﬁle --output outputﬁle [options]FASTA/FASTQ ﬁle processing:vsearch --fastq_chars fastqﬁle [options]vsearch --fastq_convert fastqﬁle --fastqout outputﬁle [options]vsearch (--fastq_eestats | --fastq_eestats2) fastqﬁle --output outputﬁle [options]vsearch --fastq_ﬁlter fastqﬁle (--fastaout | --fastaout_discarded | --fastqout | --fastqout_discarded)outputﬁle [options]vsearch --fastq_join fastqﬁle --reverse fastqﬁle (--fastaout | --fastqout) outputﬁle [options]vsearch --fastq_mergepairs fastqﬁle --reverse fastqﬁle (--fastaout | --fastqout | --fastaout_not-merged_fwd | --fastaout_notmerged_rev|--fastqout_notmerged_fwd | --fastqout_notmerged_rev|--eetabbedout) outputﬁle [options]vsearch --fastq_stats fastqﬁle [--log logﬁle][options]vsearch --fastx_revcomp fastxﬁle (--fastaout | --fastqout) outputﬁle [options]vsearch --sff_convert sff-ﬁle --fastqout outputﬁle [options]Masking:vsearch --fastx_mask fastxﬁle (--fastaout | --fastqout) outputﬁle [options]vsearch --maskfasta fastaﬁle --output outputﬁle [options]Pairwise alignment:vsearch --allpairs_global fastaﬁle (--alnout | --blast6out | --matched | --notmatched | --samout |--uc | --userout) outputﬁle (--acceptall | --id real)[options]Searching:vsearch --search_exact fastaﬁle --db fastaﬁle (--alnout | --biomout | --blast6out |--mothur_shared_out | --otutabout | --samout | --uc | --userout) outputﬁle [options]vsearch --usearch_global fastaﬁle --db fastaﬁle (--alnout | --biomout | --blast6out |--mothur_shared_out | --otutabout | --samout | --uc | --userout) outputﬁle --id real [options]Shufﬂing and sorting:vsearch (--shufﬂe | --sortbylength | --sortbysize) fastaﬁle --output outputﬁle [options]Subsampling:vsearch --fastx_subsample fastaﬁle (--fastaout | --fastqout) outputﬁle (--sample_pct real |--sam-ple_size positive integer)[options]version 2.10.4 January 4, 2019 1

vsearch(1) USER COMMANDS vsearch(1)Taxonomic classiﬁcation:vsearch --sintax fastaﬁle --db fastaﬁle --tabbedout outputﬁle [--sintax_cutoff real][options]UDB database handling:vsearch --makeudb_usearch fastaﬁle --output outputﬁle [options]vsearch --udb2fasta udbﬁle --output outputﬁle [options]vsearch (--udbinfo | --udbstats) udbﬁle [options]DESCRIPTIONEnvironmental or clinical molecular diversity studies generate large volumes of amplicons (e.g.; SSU-rRNAsequences) that need to be checked for chimeras, dereplicated, masked, sorted, searched, clustered orcompared to reference sequences. The aim of vsearch is to offer a all-in-one open source tool to performthese tasks, using optimized algorithm implementations and harvesting the full potential of modern com-puters, thus providing fast and accurate data processing.Comparing nucleotide sequences is at the core of vsearch.Tospeed up comparisons, vsearch implementsan extremely fast Needleman-Wunsch algorithm, making use of the Streaming SIMD Extensions (SSE2) ofpost-2003 x86-64 CPUs. If SSE2 instructions are not available, vsearch exits with an error message. OnPower8 CPUs it will use AltiVec/VSX/VMX instructions. Memory usage increases rapidly with sequencelength: for example comparing twosequences of length 1 kb requires 8 MB of memory per thread, andcomparing two10kbsequences requires 800 MB of memory per thread. For comparisons involvingsequences with a length product greater than 25 million (for example twosequences of length 5 kb),vsearch uses a slower alignment method described by Hirschberg(1975) and Myers and Miller (1988),with much smaller memory requirements.Inputvsearch accept as input fasta or fastq ﬁles containing one or several nucleotidic entries. In fasta ﬁles, eachnucleotidic entry is made of a header and a sequence. The header is deﬁned as the string comprisedbetween the ’>’ symbol and the ﬁrst space, tab or the end of the line, whichevercomes ﬁrst. Additionally,ifthe header matches integeras the number of occurrences (or abundance) of the sequence in the study.Thatabundance information is used or created during chimera detection, clustering, dereplication, sorting andsearching.The sequence is deﬁned as a string of IUPAC symbols (ACGTURYSWKMDBHVN), starting after the endof the identiﬁer line and ending before the next identiﬁer line, or the ﬁle end. vsearch silently ignores asciicharacters 9 to 13, and exits with an error message if ascii characters 0 to 8, 14 to 31, ’.’or’-’ are present.All other ascii or non-ascii characters are stripped and complained about in a warning message.In fastq ﬁles, each entry is made of sequence header starting with a symbol ’@’, a nucleotidic sequence(same rules as for fasta sequences), a quality header starting with a symbol ’+’ and a string of ASCII char-acters (offset 33 or 64), each one encoding the quality value of the corresponding position in the nucleotidicsequence.vsearch operations are case insensitive,except when soft masking is activated. Masking is automaticallyapplied during chimera detection, clustering, masking, pairwise alignment and searching. Soft masking isspeciﬁed with the options ’--dbmask soft’ (for searching and chimera detection with a reference) or’--qmask soft’ (for searching, de novo chimera detection, clustering and masking). When using soft mask-ing, lower case letters indicate masked symbols, while upper case letters indicate regular symbols. Maskedsymbols are neverincluded in the unique indexwords used for sequence comparisons, otherwise theyaretreated as normal symbols.When comparing sequences during chimera detection, dereplication, searching and clustering, T and U areconsidered identical, regardless of their case. If twosymbols are not identical, their alignment result in anegative mismatch score (default -4), except if one or both of the symbols are ambiguous (RYSWKMDB-HVN) in which case the score is zero. Alignment of twoidentical ambiguous symbols (for example, R vsR) also receivesascore of zero.vsearch can read data from standard ﬁles and write to standard ﬁles, but it can also read from pipes andwrite to pipes! For example, multiple fasta ﬁles can be piped into vsearch for dereplication. Todoso, ﬁleversion 2.10.4 January 4, 2019 2

Navigation menu

Upload a User Manual

Versions of this User Manual:

Wiki Guide
HTML
Mobile
Download & Help

Views

User Manual
Discussion / Help

Navigation

© 2025 UserManual.wiki

Contact Us
DMCA