Manual
User Manual:
Open the PDF directly: View PDF .
Page Count: 21
Download | |
Open PDF In Browser | View PDF |
A toolkit for DNA sequence analysis and manipulation D. Pratas (pratas@ua.pt) J. R. Almeida (joao.rafael.almeida@ua.pt) A. J. Pinho (ap@ua.pt) IEETA/DETI, University of Aveiro, Portugal Version 1.7.17 Contents 1 Introduction 3 1.1 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 FASTQ tools 5 3 FASTA tools 7 3.1 Program goose-fasta2seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Program goose-fastaextract 9 3.3 Program goose-fastaextractbyread 3.4 Program goose-fastainfo 3.5 Program goose-mutatefasta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.6 Program goose-randfastaextrachars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4 Genomic sequence tools 15 4.1 Program goose-mutatedna . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 Program goose-randseqextrachars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 Amino acid sequence tools 17 5.1 Program goose-AminoAcidToGroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 5.2 Program goose-ProteinToPseudoDNA 18 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 General purpose tools 21 Bibliography 21 1 Chapter 1 Introduction Recent advances in DNA sequencing have revolutionized the eld of genomics, making it possible for research groups to generate large amounts of sequenced data, very rapidly and at substantially lower cost. Its storage have been made using specic le formats, such as FASTQ and FASTA. Therefore, its analysis ? and manipulation is crucial [ ]. Several frameworks for analysis and manipulation emerged, namely ? GATK [?], HTSeq [?], MEGA [?], [ ], among others. GALAXY In the majority, these frameworks require licenses and do not provide a low level access to the information, since they are commonly approached by scripting or interfaces. We describe GOOSE, a (free) novel toolkit for analyzing and manipulating FASTA-FASTQ formats and sequences (DNA, amino acids, text), with many complementary tools. systems, built for fast processing. GOOSE supports pipes for easy integration. The toolkit is for Linux-based It includes tools for information display, randomizing, edition, conversion, extraction, searching, calculation and visualization. GOOSE is prepared to deal with very large datasets, typically in the scale Gigabytes or Terabytes. The toolkit is a command line version, using the prex goose- followed by the sux with the respective name of the program. GOOSE is implemented in C language and it is available, under GPLv3, at: https :// pratas . github . io / goose 1.1 For Installation GOOSE installation, run: git clone https :// github . com / pratas / goose . git cd goose / src / make 1.2 License The license is GPLv3. In resume, everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. For details on the license, consult: 2 http://www.gnu.org/ licenses/gpl-3.0.html. 3 Chapter 2 FASTQ tools Current available tools for FASTQ format analysis and manipulation include: 1. goose-fastq2fasta 2. goose-fastq2mfasta 3. goose-fastqclustreads 4. goose-FastqExcludeN 5. goose-FastqExtractQualityScores 6. goose-FastqInfo 7. goose-FastqMaximumReadSize 8. goose-FastqMinimumLocalQualityScoreForward 9. goose-FastqMinimumLocalQualityScoreReverse 10. goose-FastqMinimumQualityScore 11. goose-FastqMinimumReadSize 12. goose-count 13. goose-extractreadbypattern 14. goose-fastqpack 15. goose-fastqsimulation 16. goose-FastqSplit 17. goose-FastqTrimm 18. goose-fastqunpack 4 19. goose-filter 20. goose-findnpos 21. goose-genrandomdna 22. goose-getunique 23. goose-info 24. goose-mfmotifcoords 25. goose-mutatefastq 26. goose-newlineonnewx 27. goose-period 28. goose-permuteseqbyblocks 29. goose-randfastqextrachars 30. goose-real2binthreshold 31. goose-reducematrixbythreshold 32. goose-renamehumanheaders 33. goose-searchphash 34. goose-seq2fasta 35. goose-seq2fastq 36. goose-SequenceToGroupSequence 37. goose-splitreads 38. goose-wsearch 5 Chapter 3 FASTA tools Current available FASTA tools, for analysis and manipulation, are: 1. goose-fasta2seq: 2. goose-fastaextract: it converts a FASTA or Multi-FASTA le format to a seq. it extracts sequences from a FASTA le, which the range is dened by the user in the parameters. 3. goose-fastaextractbyread: it extracts sequences from each read in a Multi-FASTA le (splited by \n), which the range is dened by the user in the parameters. 4. goose-fastainfo: 5. goose-mutatefasta: it shows the readed information of a FASTA or Multi-FASTA le format. it reates a synthetic mutation of a fasta le given specic rates of editions, deletions and additions. 6. goose-randfastaextrachars: it substitues in the DNA sequence the outside ACGT chars by random ACGT symbols. 7. goose-geco 8. goose-gede 9. goose-reverse 3.1 The Program goose-fasta2seq goose-fasta2seq converts a FASTA or Multi-FASTA le format to a seq. For help type: ./ goose - fasta2seq -h In the following subsections, we explain the input and output paramters. 6 Input parameters The goose-fasta2seq program needs two streams for the computation, namely the input and output standard. The input stream is a FASTA or Multi-FASTA le. The attribution is given according to: Usage : ./ goose - fasta2seq [ options ] [[ - -] args ] or : ./ goose - fasta2seq [ options ] It converts a FASTA or Multi - FASTA file format to a seq . -h , -- help show this help message and exit Basic options < input . fasta > output . seq Input FASTA or Multi - FASTA file format ( stdin ) Output sequence file ( stdout ) Example : ./ goose - fasta2seq < input . fasta > output . seq An example on such an input le is: > AB000264 | acc = AB000264 | descr = Homo sapiens mRNA ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC GGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA GTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGC TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA > AB000263 | acc = AB000263 | descr = Homo sapiens mRNA ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGT GGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTG GTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAG GCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAA TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA Output The output of the goose-fasta2seq program is a group sequence. An example, for the input, is: ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC GGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA GTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGC TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAAACAAGATGCCATTGTCCCCCGGCCTCCTGCTG CTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGTGGCCCCACCGGCCGAGACAGCGAGCATATGCA GGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGTGGTTTGAGTGGACCTCCCAGGCCAGTGCCG GGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGGCGCACCCCCCCAGCAATCCGCGCGCCGGGAC AGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAATAAAACCTCACCCATGAATGCTCACGCAAGTT TAATTACAGACCTGAA 7 3.2 The Program goose-fastaextract goose-fastaextract extracts sequences from a FASTA le, which the range is dened by the user in the parameters. For help type: ./ goose - fastaextract -h In the following subsections, we explain the input and output paramters. Input parameters The goose-fastaextract program needs two paramenters, which denes the begin and the end of the extraction, and two streams for the computation, namely the input and output standard. The input stream is a FASTA le. The attribution is given according to: Usage : ./ goose - fastaextract [ options ] [[ - -] args ] or : ./ goose - fastaextract [ options ] It extracts sequences from a FASTA file . -h , -- help show this help message and exit Basic options -i , -- init =< int > -e , -- end =< int > < input . fasta > output . seq The first position to start the extraction ( default 0) The last extract position ( default 100) Input FASTA or Multi - FASTA file format ( stdin ) Output sequence file ( stdout ) Example : ./ goose - fastaextract -i < init > -e< input . fasta > output . seq An example on such an input le is: > AB000264 | acc = AB000264 | descr = Homo sapiens mRNA ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC GGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA GTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGC TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA Output The output of the goose-fastaextract program is a group sequence. An example, using the value 0 as extraction starting point and the 50 as the end, for the provided input, is: 8 ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGG 3.3 The Program goose-fastaextractbyread goose-fastaextractbyread extracts sequences from a FASTA or Multi-FASTA le, which the range is dened by the user in the parameters. For help type: ./ goose - fastaextractbyread -h In the following subsections, we explain the input and output paramters. Input parameters The goose-fastaextractbyread program needs two paramenters, which denes the begin and the end of the extraction, and two streams for the computation, namely the input and output standard. The input stream is a FASTA or Multi-FASTA le. The attribution is given according to: Usage : ./ goose - fastaextractbyread [ options ] [[ - -] args ] or : ./ goose - fastaextractbyread [ options ] It extracts sequences from each read in a Multi - FASTA file ( splited by \ n) -h , -- help Basic options -i , -- init =< int > -e , -- end =< int > < input . fasta > output . fasta show this help message and exit The first position to start the extraction ( default 0) The last extract position ( default 100) Input FASTA or Multi - FASTA file format ( stdin ) Output FASTA or Multi - FASTA file format ( stdout ) Example : ./ goose - fastaextractbyread -i -e < input . fasta > output . fasta An example on such an input le is: > AB000264 | acc = AB000264 | descr = Homo sapiens mRNA ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC GGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA GTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGC TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA > AB000263 | acc = AB000263 | descr = Homo sapiens mRNA ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGT GGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTG GTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAG 9 GCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAA TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA Output The output of the goose-fastaextractbyread program is FASTA or Multi-FASTA le wiht the extracted sequences. An example, using the value 0 as extraction starting point and the 50 as the end, for the provided input, is: > AB000264 | acc = AB000264 | descr = Homo sapiens mRNA ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGG > AB000263 | acc = AB000263 | descr = Homo sapiens mRNA ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCC 3.4 The Program goose-fastainfo goose-fastainfo shows the readed information of a FASTA or Multi-FASTA le format. For help type: ./ goose - fastainfo -h In the following subsections, we explain the input and output paramters. Input parameters The goose-fastainfo program needs two streams for the computation, namely the input and output standard. The input stream is a FASTA or Multi-FASTA le. The attribution is given according to: Usage : ./ goose - fastainfo [ options ] [[ - -] args ] or : ./ goose - fastainfo [ options ] It shows read information of a FASTA or Multi - FASTA file format . -h , -- help Basic options < input . fasta > output show this help message and exit Input FASTA or Multi - FASTA file format ( stdin ) Output read information ( stdout ) Example : ./ goose - fastainfo < input . fasta > output An example on such an input le is: 10 > AB000264 | acc = AB000264 | descr = Homo sapiens mRNA ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC GGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA GTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGC TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA > AB000263 | acc = AB000263 | descr = Homo sapiens mRNA ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGT GGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTG GTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAG GCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAA TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA Output The output of the goose-fastainfo program is a set of informations related with the le readed. An example, for the input, is: Number Number MIN of MAX of AVG of 3.5 The of reads : 2 of bases : 736 bases in read : 368 bases in read : 368 bases in read : 368.0000 Program goose-mutatefasta goose-mutatefasta creates a synthetic mutation of a fasta le given specic rates of editions, deletions and additions. All these paramenters are dened by the user, and their are optional. For help type: ./ goose - mutatefasta -h In the following subsections, we explain the input and output paramters. Input parameters The goose-mutatefasta program needs two streams for the computation, namely the input and output standard. However, optional settings can be supplied too, such as the starting point to the random generator, and the edition, deletion and insertion rates. Also, the user can choose to use the ACGTN alphabet in the synthetic mutation. The input stream is a FASTA or Multi-FASTA File. The attribution is given according to: Usage : ./ goose - mutatefasta [ options ] [[ - -] args ] or : ./ goose - mutatefasta [ options ] Creates a synthetic mutation of a fasta file given specific rates of editions , deletions and additions 11 -h , -- help show this help message and exit Basic options < input . fasta > output . fasta Input FASTA or Multi - FASTA file format ( stdin ) Output FASTA or Multi - FASTA file format ( stdout ) Optional -s , -- seed =< int > -e , -- edit - rate =< dbl > -d , -- deletion - rate =< dbl > -i , -- insertion - rate =< dbl > -a , -- ACGTN - alphabet Starting point to the random generator Defines the edition rate ( default 0.0) Defines the deletion rate ( default 0.0) Defines the insertion rate ( default 0.0) When active , the application uses the ACGTN alphabet Example : ./ goose - mutatefasta -s < seed > -e < edit rate > -d < deletion rate > -i < insertion rate > -a < input . fast An example on such an input le is: > AB000264 | acc = AB000264 | descr = Homo sapiens mRNA ACAAGACGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCCTGGAGGGTCCACCGCTGCCCTGCTGCCATTGTCCCC GGCCCCACCTAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTGGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAA GTGGTTTGAGTGGACCTCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAGCAGGCCAGTGCC GCGAATCCGCGCGCCGGGACAGAATCTCCTGCAAAGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCACCCCCCCAGC TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA > AB000263 | acc = AB000263 | descr = Homo sapiens mRNA ACAAGATGCCATTGTCCCCCGGCCTCCTGCTGCTGCTGCTCTCCGGGGCCACGGCCACCGCTGCCCTGCCCCTGGAGGGT GGCCCCACCGGCCGAGACAGCGAGCATATGCAGGAAGCGGCAGGAATAAGGAAAAGCAGCCTCCTGACTTTCCTCGCTTG GTGGTTTGAGTGGACCTCCCAGGCCAGTGCCGGGCCCCTCATAGGAGAGGAAGCTCGGGAGGTGGCCAGGCGGCAGGAAG GCGCACCCCCCCAGCAATCCGCGCGCCGGGACAGAATGCCCTGCAGGAACTTCTTCTGGAAGACCTTCTCCTCCTGCAAA TAAAACCTCACCCATGAATGCTCACGCAAGTTTAATTACAGACCTGAA Output The output of the goose-mutatefasta program is a FASTA or Multi-FASTA le whith the synthetic mutation of input le. Using the seed value as 1 and the edition rate as 0.5, an example for this input, is: > AB000264 | acc = AB000264 | descr = Homo sapiens mRNA ACGCAACGNATTCCTGCTGATCATANTGTNCCGCNCCCCNGCGACGGGGNCTCNCNNGCACACATNGTACCATTGTCCAC NCTTNCANGTNANCGCTAGCAGGCTACNGTTTNTCCTCNCCTANNCCAANCNGGCGTNNNTACACTGGCACGTGCAGGCA TNGGTCGGCNGGNNCCTCCGGNAACGGCACCGGAGACGAAGCTCGGNGGNTATACAGGTGTCANGAAACATCCCCGCGNC GNGTGNCCNNGAANCCANAGAGTATCTCACTCACAACCCTGCGTGCACNTCTAGAGNANGACCTTACNCACCNTCCCNTT NNGTACCACACCAATGAACGCTGCAGAAAGTCTGTTTNNAGGNGNGCA > AB000263 | acc = AB000263 | descr = Homo sapiens mRNA ATTTGAAGGCAANCGGNCCAGNAATNCGGNGGGTGCNGCTCNTGTNGGCTACGGNCATCGCGGCCCTGCTNTANTAAGCN TGAACCACCGNTCGNNGCACTTAGCAATNGCGNAANCCGTCGGCACGGCGGAGACNAANCCGCTANTNNTTTCCCGCTNA ATGGNTGTACAAGACCNACTANACCANCCTCCGTCACCACACTGGAGCGCANGATGGNNCGCTGNCTAGNAGNCNNTGAG GCGCTCCNTCCTANAAANCCGTGGNCGAGCNCCCTATGGNAGNGTGGGGGTTTTACCGGAAGACCNTCGNGCCCTATGGG AGCAATCANAANCTAGAAAGCTTACNGATGGTGANGAANTAGACTANG 12 3.6 The Program goose-randfastaextrachars goose-randfastaextrachars substitues in the DNA sequence the outside ACGT chars by random ACGT symbols. It works both in FASTA and Multi-FASTA le formats. For help type: ./ goose - randfastaextrachars -h In the following subsections, we explain the input and output paramters. Input parameters The goose-randfastaextrachars program needs two streams for the computation, namely the input and output standard. The input stream is a FASTA or Multi-FASTA le. The attribution is given according to: Usage : ./ goose - randfastaextrachars [ options ] [[ - -] args ] or : ./ goose - randfastaextrachars [ options ] It substitues in the DNA sequence the outside ACGT chars by random ACGT symbols . It works both in FASTA and Multi - FASTA file formats -h , -- help show this help message and exit Basic options < input . fasta > output . fasta Input FASTA or Multi - FASTA file format ( stdin ) Output FASTA or Multi - FASTA file format ( stdout ) Example : ./ goose - randfastaextrachars < input . fasta > output . fasta An example on such an input le is: to do Output The output of the goose-randfastaextrachars program is a FASTA or Multi-FASTA le. An example, for the input, is: to do 13 Chapter 4 Genomic sequence tools Current available genomic sequence tools, for analysis and manipulation, are: 1. goose-mutatedna 2. goose-randseqextrachars 4.1 The Program goose-mutatedna goose-mutatedna ... For help type: ./ goose - mutatedna -h In the following subsections, we explain the input and output paramters. Input parameters The goose-mutatedna program needs ... The attribution is given according to: TO DO An example on such an input le is: TO DO Output The output of the goose-mutatedna program ... An example, for the input, is: TO DO 14 4.2 The Program goose-randseqextrachars goose-randseqextrachars ... For help type: ./ goose - randseqextrachars -h In the following subsections, we explain the input and output paramters. Input parameters The goose-randseqextrachars program needs ... The attribution is given according to: TO DO An example on such an input le is: TO DO Output The output of the goose-randseqextrachars program ... An example, for the input, is: TO DO 15 Chapter 5 Amino acid sequence tools Current available amino acid sequence tools, for analysis and manipulation, are: 1. goose-AminoAcidToGroup: 2. goose-ProteinToPseudoDNA: it converts an amino acid (protein) sequence to a pseudo DNA sequence. 5.1 The it converts an amino acid sequence to a group sequence. Program goose-AminoAcidToGroup goose-AminoAcidToGroup converts an amino acid sequence to a group sequence. For help type: ./ goose - AminoAcidToGroup -h In the following subsections, we explain the input and output paramters. Input parameters The goose-AminoAcidToGroup program needs two streams for the computation, namely the input and output standard. The input stream is an amino acid sequence. The attribution is given according to: Usage : ./ goose - AminoAcidToGroup [ options ] [[ - -] args ] or : ./ goose - AminoAcidToGroup [ options ] It converts a amino acid sequence to a group sequence . -h , -- help Basic options < input . prot > output . group show this help message and exit Input amino acid sequence file ( stdin ) Output group sequence file ( stdout ) Example : ./ goose - AminoAcidToGroup < input . prot > output . group Table : Prot Group R P 16 H K D E S T N Q C U G P A V I L M F Y W * X P P N N U U U U S S S S H H H H H H H H * X Amino acids with electric charged side chains : POSITIVE Amino acids with electric charged side chains : NEGATIVE Amino acids with electric UNCHARGED side chains Special cases Amino acids with hydrophobic side chains Others Unknown It can be used to group amino acids by properties, such as electric charge (positive and negative), uncharged side chains, hydrophobic side chains and special cases. An example on such an input le is: IPFLLKKQFALADKLVLSKLRQLLGGRIKMMPCGGAKLEPAIGLFFHAIGINIKLGYGMTETTATVSCWHDFQFNPNSIG TLMPKAEVKIGENNEILVRGGMVMKGYYKKPEETAQAFTEDGFLKTGDAGEFDEQGNLFITDRIKELMKTSNGKYIAPQY IESKIGKDKFIEQIAIIADAKKYVSALIVPCFDSLEEYAKQLNIKYHDRLELLKNSDILKMFE Output The output of the goose-AminoAcidToGroup program is a group sequence. An example, for the input, is: HSHHHPPUHHHHNPHHHUPHPUHHSSPHPHHSSSSHPHNSHHSHHHPHHSHUHPHSHSHUNUUHUHUSHPNHUHUSUUHS UHHSPHNHPHSNUUNHHHPSSHHHPSHHPPSNNUHUHHUNNSHHPUSNHSNHNNUSUHHHUNPHPNHHPUUUSPHHHSUH HNUPHSPNPHHNUHHHHHNHPPHHUHHHHSSHNUHNNHHPUHUHPHPNPHNHHPUUNHHPHHN 5.2 The Program goose-ProteinToPseudoDNA goose-ProteinToPseudoDNA converts an amino acid (protein) sequence to a pseudo DNA sequence. For help type: 17 ./ goose - ProteinToPseudoDNA -h In the following subsections, we explain the input and output paramters. Input parameters The goose-ProteinToPseudoDNA program needs two streams for the computation, namely the input and output standard. The input stream is an amino acid sequence. The attribution is given according to: Usage : ./ goose - ProteinToPseudoDNA [ options ] [[ - -] args ] or : ./ goose - ProteinToPseudoDNA [ options ] It converts a protein sequence to a pseudo DNA sequence . -h , -- help Basic options < input . prot > output . dna show this help message and exit Input amino acid sequence file ( stdin ) Output DNA sequence file ( stdout ) Example : ./ goose - ProteinToPseudoDNA < input . prot > output . dna Table : Prot DNA A GCA C TGC D GAC E GAG F TTT G GGC H CAT I ATC K AAA L CTG M ATG N AAC P CCG Q CAG R CGT S TCT T ACG V GTA W TGG Y TAC * TAG X GGG It can be used to generate pseudo-DNA with characteristics passed by amino acid (protein) sequences. An example on such an input le is: IPFLLKKQFALADKLVLSKLRQLLGGRIKMMPCGGAKLEPAIGLFFHAIGINIKLGYGMTETTATVSCWHDFQFNPNSIG TLMPKAEVKIGENNEILVRGGMVMKGYYKKPEETAQAFTEDGFLKTGDAGEFDEQGNLFITDRIKELMKTSNGKYIAPQY IESKIGKDKFIEQIAIIADAKKYVSALIVPCFDSLEEYAKQLNIKYHDRLELLKNSDILKMFE 18 Output The output of the goose-ProteinToPseudoDNA program is a DNA sequence. An example, for the input, is: ATCCCGTTTCTGCTGAAAAAACAGTTTGCACTGGCAGACAAACTGGTACTGTCTAAACTGCGTCAGCTGCTGGGCGGCCG TATCAAAATGATGCCGTGCGGCGGCGCAAAACTGGAGCCGGCAATCGGCCTGTTTTTTCATGCAATCGGCATCAACATCA AACTGGGCTACGGCATGACGGAGACGACGGCAACGGTATCTTGCTGGCATGACTTTCAGTTTAACCCGAACTCTATCGGC ACGCTGATGCCGAAAGCAGAGGTAAAAATCGGCGAGAACAACGAGATCCTGGTACGTGGCGGCATGGTAATGAAAGGCTA CTACAAAAAACCGGAGGAGACGGCACAGGCATTTACGGAGGACGGCTTTCTGAAAACGGGCGACGCAGGCGAGTTTGACG AGCAGGGCAACCTGTTTATCACGGACCGTATCAAAGAGCTGATGAAAACGTCTAACGGCAAATACATCGCACCGCAGTAC ATCGAGTCTAAAATCGGCAAAGACAAATTTATCGAGCAGATCGCAATCATCGCAGACGCAAAAAAATACGTATCTGCACT GATCGTACCGTGCTTTGACTCTCTGGAGGAGTACGCAAAACAGCTGAACATCAAATACCATGACCGTCTGGAGCTGCTGA AAAACTCTGACATCCTGAAAATGTTTGAG 19 Chapter 6 General purpose tools 1. goose-comparativemap: visualisation of comparative maps. It builds a image given specic patterns between two sequences. 2. goose-BruteForceString: it generates, line by line, multiple combinations of strings up to a certain size. 3. goose-char2line: 4. goose-sum: it adds the second column value to the rst column value. 5. goose-min: it nds the minimum value between two column values. 6. goose-minus: 7. goose-max: 8. goose-extract: it extracts a subsequence of a sequence by coordinates. 9. goose-segment: it segments a sequence given a certain threshold. it transforms each char into a char in each line. it substracts the second column value to the rst column value. it nds the mmaximum value between two column values. 20
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 21 Producer : pdfTeX-1.40.16 Creator : TeX Create Date : 2018:07:24 18:24:01+01:00 Modify Date : 2018:07:24 18:24:01+01:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1EXIF Metadata provided by EXIF.tools