Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 13

LTR_retriever User Manual

Shujun Ou and Ning Jiang

oushujun@msu.edu & jiangn@msu.edu

Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA

In any research documents using LTR_retriever please cite the following paper:

Shujun Ou and Ning Jiang (2017) LTR_retriever: a highly accurate and sensitive program for

identification of LTR retrotransposons (in preparation)

LTR_retriever is licensed under GNU GPLv3.

Questions and Issues Please See: https://github.com/oushujun/LTR_retriever

Sept. 21, 2017

1 Introduction

LTR_retriever is a command line program (in Perl) for accurate identification of LTR

retrotransposons (LTR-RTs) from outputs of LTRharvest (1), LTR_FINDER (2), and MGEScan-

LTR (3, 4) and generation of a non-redundant LTR-RT library for genome annotations.

As one of the most prevalent transposable elements (TEs), LTR-RT comprises the largest portion

of most plant genomes (5). Due to the sequence diversity of LTR-RTs, identification of such elements

based on sequence homology is inefficient. Instead, LTR-RTs are conserved in terms of element

structure across different species. Several programs have been developed to search for LTR-RTs using

relevant structural characteristics. These programs are very sensitive; however, they are not very

accurate and specific for LTR-RT identifications. LTR_retriever was developed to address the

accuracy and specificity needs, with several new functions to facilitate genome annotation and other

downstream studies.

LTR_retriever aims to identify high-quality LTR-RT exemplars (Figure 1A) that are intact

and non-redundant from a variety of LTR-RT candidates. To retain sensitivity, sequences of nested

LTRs and truncated LTRs (Figure 1CD) that are not represented by intact LTR-RTs will also be

included in the exemplar. This package excludes the vast majority of the non-LTR false positives. The

most common false positives were introduced by two adjacent non-LTR repeats which are found as

SINEs, LINEs, DNA TEs, or solo-LTRs that are derived from different elements (Figure 3). In

addition, LTR_retriever excludes non-LTR open reading frames derived from LINEs, DNA TEs,

or plant coding sequences to reduce misannotations of non-LTR coding sequences as LTR elements.

LTR_retriever identifies and removes LTR-RT nested insertions in the identified intact LTR-RTs,

which also reduces library redundancy. This program can also accurately identify rare non-canonical

LTR-RTs that have terminal motifs different from the canonical 5'-TG..CA-3' motif. The program was

built with a variety of Perl scripts that can be utilized for downstream analyses.

1.1 Main features of LTR_retriever

- A command line Perl program;

- Supports multi-threading;

- Identifies intact LTR-RTs with accurate boundaries;

- Identifies rare LTR-RTs with non-canonical (non-'TGCA') motifs;

- Supports multiple inputs: LTRharvest, LTR_FINDER, and/or MGEScan_LTR;

- Sequence input: FASTA format (contigs, scaffolds, genomes, corrected PacBio reads, and

etc.);

- Output: a non-redundant LTR-RT library (FASTA), GFF3 for all intact LTR-RTs, whole-

genome LTR-RT annotation (GFF), and a comprehensive table.

2 The Structure and Characteristics of LTR-RTs

The structure of an LTR retrotransposon (LTR-RT) is characterized by long terminal repeat

ranging from 75 bp to 5000 bp (Figure 1A). The region between the 5' LTR and 3' LTR is termed the

internal region, which encodes proteins for transposition. At the very termini of the LTRs are the bi-

nucleic motifs, which is 5'-TG..CA-3' in most cases. However, various other motifs have been

detected in the sacred lotus (Nelumbo nucifera) genome and in the rice (Oryza sativa) genome during

our manual annotation, and also found in other studies (e.g., Tos17 (6) ; AtRE1 (7); and TARE1 (8)).

Flanking the terminal motifs is the target site duplication (TSD), which is generated by staggered cuts

from integrase activity (Figure 2) during LTR-RT insertion. TSDs are typically 5 bp in plants but

could vary between 3-6 bp, and the 5' and 3' TSD should be identical because of the mechanism of

their formation (Figure 2). The recently inserted LTR-RT has a highly similar LTR region that is

recognizable by sequence alignment, which is the primary searching scheme for LTR search programs

(1, 2, 4, 9). However, if two highly similar repetitive elements other than LTR (e.g., DNA, LINE,

SINE, solo-LTR, tandem repeat, etc.) are located close to each other (Figure 3), searching tools may

falsely choose them and report them as LTR-RT candidates. These are the most frequent false

positives that occur in de novo searches for LTR-RTs. Given that LTR-RTs are following the "copy-

and-paste" duplication scheme, the regions flanking the newly inserted LTR-RT are unlikely to be

identical to the termini of the internal region. For example, in an intact LTR-RT (Figure 1A), region

“a” is not identical to region “c”, and region “b” is not identical to region “d”. Thus, by aligning the

flanking regions of the two LTR fragments (Figure 3), LTR_retriever can obtain the boundary

information for the candidate.

Figure 1. The structures of LTR retrotransposons (LTR-RT). Gray boxes: target site

duplications (TSD); red triangles: LTR motifs; blue boxes: long terminal repeat (LTR);

sequence between 5' LTR and 3' LTR denotes the internal region. (A) The structure of an

intact LTR-RT. Regions a, b, c, and d are main targets analyzed by LTR_retriever. (B)

The structure of a solo-LTR. (C) The structures of nest-inserted LTR-RTs. (D) The structures

of truncated LTR-RTs. Drawing is not on scale.

Figure 2. Formation of target site duplications (TSD). Integrases coded by LTR-RTs

generate staggered cuts (in this case 5'-GTACT-3') on the sequence before new LTR-RT

insertions. By gap filling and sequence ligation, a pair of TSD is formed flanking the newly

inserted LTR.

Figure 3. The most common false positives in de novo searches for LTR-RTs. Gray boxes:

two closely positioned SINE/LINE/DNA elements/solo-LTRs/tandem repeats. Light blue boxes:

false LTR regions reported by de novo searches. The false positive also has TSD-like structure

but commonly has extended sequence identity on one or both termini (orange and brown

boxes).

3 Workflow of LTR_retriever

In LTR_retriever, there are eight modules developed to screen and filter out false positives

and construct non-redundant LTR exemplars (Figure 4). More details can be found in our article.

Figure 4. Workflow of LTR_retriever. Modules 1-8 are indicated in parentheses.

4 Installation

LTR_retriever is a command line Perl program that incorporates several programs for

analysis and runs in UNIX-like systems. These programs include:

 BLAST+ (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/),

 CDHIT (http://weizhongli-lab.org/cd-hit/) OR BLAST

(ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy/2.2.25/),

 HMMER (http://hmmer.org/), and

 RepeatMasker (http://www.repeatmasker.org/).

To run LTR_retriever, you need to provide the paths to the following dependent programs:

 makeblastdb, blastn, and blastx in the BLAST+ package,

 cd-hit-est in the CDHIT package OR blastclust in the BLAST package,

 hmmsearch in the HMMER package, and

 RepeatMasker

If the above programs are all accessible through ENV in the UNIX-like system (i.e. paths exported

to .bashrc), no installation is needed. Otherwise, users need to modify the paths file under the

your_path_to/LTR_retriever/ directory. If specifying a path, the required program(s) must

be directly contained in that path but not in any subdirectories.

For example:

Edit the paths file using vi/vim (you may use other text editors such as emacs)

vi /your_path_to/LTR_retriever/paths

Modify the following lines

BLAST+=/your_path_to/BLAST+2.2.30/bin/

RepeatMasker=/your_path_to/RepeatMasker4.0.0/

HMMER=/your_path_to/HMMER3.1b2/bin/

CDHIT=/your_path_to/CDHIT4.6.1/

BLAST=/your_path_to/BLAST2.2.26/bin/ #not required if CDHIT provided

Save changes to paths and exit. The installation is done.

5 Inputs

Two types of inputs are needed for LTR_retriever:

1. Genomic sequence

2. LTR-RT candidates

5.1 Genomic sequence

The sequence must be in FASTA format. Users should backup the original input FASTA file because

LTR_retriever will modify sequence names that are longer than 20 characters to fit the naming

space of RepeatMasker. Thus, FASTA sequence names are recommended to be less than 20

characters without spaces and special punctuation marks other than dots (.) and underscores

(_). For long sequencing reads (e.g., PacBio), self-corrected reads are needed unless the sequence

error rate is lower than 10%.

5.2 LTR-RT candidates

LTR_retriever takes multiple LTR-RT candidate inputs including the standard output of

LTRharvest, the standard output of LTR_FINDER, and the candidate output of MGEScan-LTR.

Users need to obtain the input file(s) from the aforementioned programs before running

LTR_retriever. Either a single input source or a combination of multiple inputs is acceptable. The

following command lines provide examples and suggestions to obtain inputs.

Input from LTRharvest (a program of GenomeTools (10)): (“\” indicates this line and next

line belong to the same command line. Please delete “\” if you run into errors.)

gt suffixerator \

-db genome.fa \

-indexname genome.fa \

-tis -suf -lcp -des -ssp -sds -dna

gt ltrharvest \

-index genome.fa \

-similar 90 -vic 10 -seed 20 -seqids yes \

-minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 \

-motif TGCA -motifmis 1 > genome.harvest.scn

Input from LTR_FINDER:

ltr_finder -D 15000 -d 1000 -L 7000 -l 100 \

-p 20 -C -M 0.9 genome.fa > genome.finder.scn

Input from MGEScan_LTR (a modified version obtained from DAWGPAWS (11)):

perl find_ltr_DAWGPAWS.pl \

-seq=genome.fa \

-min-ltr=100 -max-ltr=7000 -min_iden=90

The –nonTGCA input can be obtained without specifying -motif TGCA -motifmis 1 in

LTRharvest:

gt ltrharvest \

-index genome.fa \

-similar 90 -vic 10 -seed 20 -seqids yes \

-minlenltr 100 -maxlenltr 7000 -mintsd 4 -maxtsd 6 \

> genome.harvest.nonTGCA.scn

6 Outputs

The output of LTR_retriever includes:

 A summary table for the identified intact LTR-RTs with coordinate and structural information

(*.pass.list)

 A non-redundant LTR-RT library (exemplar) in the FASTA format (*.LTRlib.fa)

 A GFF3 format file for all intact LTR-RTs (*.pass.list.gff3)

 A GFF format file for the whole-genome LTR-RT annotation (*.gff)

Example of an intact LTR-RT list:

#LTR_loc Category Motif TSD 5'_TSD 3'_TSD Internal Similarity Strand Family

Superfamily Insertion_Time

Chr10:10211053..10223177 pass motif:TGCC TSD:GGTGG 10211048..10211052

10223178..10223182 IN:10211491..10222739 0.9794 - Gypsy LTR 1030000

Chr10:11328988..11335399 pass motif:TGCA TSD:CAGTC 11328983..11328987

11335400..11335404 IN:11329472..11334915 0.9587 - Copia LTR 2065000

Chr2:16844554..16849613 pass motif:TGCA TSD:GCATG 16844549..16844553

16849614..16849618 IN:16845219..16848945 0.9416 ? unknown NA 2920000

Chr2:17280296..17282788 pass motif:TGCA TSD:TATAC 17280291..17280295

17282789..17282793 IN:17280474..17282616 0.9497 + unknown LTR 2515000

Chr2:17891339..17904598 pass motif:TGCA TSD:CCCTC 17891334..17891338

17904599..17904603 IN:17892615..17903320 0.9867 ? Gypsy LTR 665000

Example of an LTR library:

>Chr10:1057194..1057414_LTR#LTR/Copia

TGTTGGCGAACGGCTTCGTCAGACTCTCGCGGCGCGCTCCACGCGCACGACGCGCACCC

CGCGCACGACGCGCAGCGCTCCTCGCTCCGCTCGCACCGCTGCACGTCCGTTAGACCAG

GGGATTAGTTAGGCCCAGGCAACTCCCAAGCCTTGTTGTACATGTATAAATGTAAGCTC

CATTGATCAATGAAAGTTACGGTTGATCCAAATCTCCTTCTACA

>Chr10:12110230..12110468_LTR#LTR/unknown

TGTCATGGGCTTTGGGCCGGGAGTCCTAGGCCCATGAGATAGAATTAGGGTTTGTTAGG

ATTAGATAAGGTTTGTTAGGATTAGATTAAGTAGCCCTCCATCTATATAAGGAGGGATC

CTATCCCAGGTCAGTTAGGCATTAGATCAATATTTATCTTAGTGCCCATCGGCCTGCCT

TCTCAGTGCGACGGAGAGCGTCGCGCCGTTTAGGTTCAGGACCGTATTCCTTGTTCGTG

ACA

>Chr10:10063621..10068275_INT#LTR/Gypsy

AATCCACCCCCCTTACAAGAATTTCGTCCCCGAGATTCGAGGAGGCTAGCATGAAGATA

…

CTACATGCCGCTAGCGGATCCTGCAGTCTTCCGGAGCTTCGGCAAACAATTGGCGACAT

CTTCTTCTTCGCAAAGCTGACCATCTAGTACTAGTTGAAAATCCGAAGGAGGAAGAAGA

CAATAAACATTTTGCAAAT

7 Usage

7.1 LTR_retriever is called as follows:

LTR_retriever -genome genomefile -inharvest LTRharvest_input [options]

where -genome specifies the genome sequence and is also used as the root file name of outputs;

the -inharvest parameter specifies the LTR-RT candidate file obtained from LTRharvest.

Multiple candidate sources can be used (see Table 1).

For example,

User provides only one candidate source:

e.g. 1

LTR_retriever -genome genome.fa -infinder genome.finder.scn

e.g. 2

LTR_retriever -genome genome.fa -inharvest genome.harvest.scn

e.g. 3

LTR_retriever -genome genome.fa -inmgescan genome.MGEScan.scn

User provides multiple candidate sources:

e.g. 4

LTR_retriever \

-genome genome.fa \

-inharvest genome.harvest.scn \

-infinder genome.finder.scn

e.g. 5

LTR_retriever \

-genome genome.fa \

-inharvest genome.harvest.scn \

-infinder genome.finder.scn \

-inmgescan genome.MGEScan.scn

To recover non-canonical LTR-RTs, you may use the –nonTGCA option to provide extra

candidates along with other input(s) (either one source or multiple sources).

e.g. 6

LTR_retriever \

-genome genome.fa \

-inharvest genome.harvest.scn \

–nonTGCA genome.harvest.nmtf.scn

e.g. 7

LTR_retriever \

-genome genome.fa \

-infinder genome.finder.scn \

-inharvest genome.harvest.scn \

-inmgescan genome.MGEScan.scn \

–nonTGCA genome.harvest.nonTGCA.scn

7.2 WARNINGS

LTR_retriever will alter sequence names longer than 20 characters to fit the naming requirement

of RepeatMasker. Please backup your original genome file before using LTR_retriever.

LTR_retriever can take multiple sources as inputs for one single run, but running multiple

instances of LTR_retriever in the same folder at the same time may cause errors.

7.3 An overview of all parameters

Table 1. All parameters for LTR_retriever.

Input options

-genome

[FASTA File]

specify the genome sequence file (in FASTA format)

-inharvest

[File]

LTR-RT candidates obtained from the screen output of LTRharvest

with -motif TGCA parameters

-infinder

[File]

LTR-RT candidates obtained from the screen output of LTR_FINDER

-inmgescan

[File]

LTR-RT candidates obtained from the output of MGEScan_LTR

(the .ltrloc file)

-nonTGCA [File]

Non-canonical LTR-RT candidates obtained from the screen output of

LTRharvest with default parameters

Output options

-verbose

retain intermediate outputs (developer mode)

-noanno

disable whole genome LTR-RT annotation (no GFF output)

Filter options

-misschar [CHR]

specify the character for ambiguous sequences in the genome (default N)

-Nscreen

disable filtering ambiguous sequence in LTR-RT candidates (default

enable (by not specifying this flag))

-missmax [INT]

specify the maximum number of ambiguous bp allowed in an LTR-RT

candidate (default 10)

-missrate [0-1]

specify the maximum percentage of ambiguous length bp allowed in an

LTR-RT candidate (default 0.8)

-minlen [INT]

specify the minimum length (bp) of the LTR region (default 100)

-max_ratio

[FLOAT]

specify the maximum length ratio of the internal region length over the

LTR region length (default 50)

-minscore [INT]

specify the minimum alignment length (INT/2) to identify and filter out

tandem repeats in an LTR-RT candidate (default 1000)

-flankmiss [1-

60]

specify the maximum gap length (bp) allowed in 60bp-flanking

sequences (default 25), smaller number indicates higher stringency

-flanksim [0-

100]

specify the minimum percentage of identity for flanking sequence

alignment (default 60)

-flankaln [0-1]

specify the maximum alignment portion allowed for 60bp-flanking

sequences (default 0.6)

-motif

[[STRING]]

specify a list of (known) motifs in square brackets as the prior knowledge

to search for non-canonical LTR-RTs (default -motif [TCCA TGCT

TACA TACT TGGA TATA TGTA TGCA])

-notrunc

Discard sequence information from truncated LTR-RTs and nested LTR-

RTs (will dampen sensitivity) (default retain (not specifying this flag))

-procovTE [0-1]

specify the maximum portion of an LTR-RT candidate allowed for

cumulated alignments to the DNA TE database and the LINE database

(default 0.7)

-procovPL [0-1]

specify the maximum portion of an LTR-RT candidate allowed for

cumulated alignments to the plant protein database (default 0.7)

-prolensig

[INT]

specify the minimum alignment length (bp) to be counted for LINE/DNA

transposase/plant protein alignment (default 90)

Library options

-blastclust

[[STRING]]

specify blastclust parameters in square brackets (default -

blastclust [-L .9 -b T -S 80]). By triggering this tag

without specifying any parameters (-blastclust), blastclust

will be turned on (default off) with default parameters. blastclust

settings refer to

http://www.ncbi.nlm.nih.gov/Web/Newsltr/Spring04/blastlab.html

-cdhit

specify cd-hit-est parameters in square brackets (default -cdhit

[[STRING]]

[-c 0.8 -G 0.8 -s 0.9 -T 20 -aL 0.9 -aS 0.9]). By

triggering this tag without specifying any parameters, cd-hit-est will

be turned on (default on) with default parameters. cd-hit-est

settings refer to http://weizhongli-lab.org/cd-hit/wiki/doku.php?id=cd-

hit_user_guide

-linelib

[FASTA File]

specify a custom LINE transposase database for LINE TE exclusion

(default LTR_retriever/database/Tpases020812LINE)

-dnalib

[FASTA File]

specify a custom DNA TE transposase database for DNA TE exclusion

(default LTR_retriever/database/Tpases020812DNA)

-plantprolib

[FASTA File]

specify a custom plant protein database for protein coding sequence

exclusion (default

LTR_retriever/database/alluniRefprexp082813)

-TEhmm

[Pfam File]

specify a custom Pfam database for TE identification (default

LTR_retriever/database/TEfam.hmm)

Miscellaneous

-u [FLOAT]

specify the neutral mutation rate of the target species (per bp per year)

(default 1.3e-8 (from rice (12)))

-threads [INT]

specify the number of threads (≤ total available threads, default 4)

--help (-h)

display the help information

8 Benchmarks

Run time of LTR_retriever is roughly proportional to total candidate number (input) as shown in

Table 2. The size and the LTR-RT fraction of a genome together determine the number of LTR-RT

candidates identified by prediction programs.

Table 2. Benchmark of LTR_retriever in model genomes.

Arabidopsis

Drosophila

Rice (MSU v7)

Sacred lotus

Maize (B73 v4)

Genome size (Mb)

120

144

374

708

2,134

Raw candidates

2335

2642

5436

12011

114048

Intact LTR-RT

232

517

2129

918

43227

Fraction masked

7.4%

12.4%

25.3%

29.6%

70.1%

Library entry

233

359

1529

1467

12360

Run time (-threads 20)*

10 min

42 min

2.1 hr

94.9 hr

*not including the time of whole-genome LTR-RT annotation.

9 Reusable Scripts

LTR_retriever was built on several flexible Perl scripts which are useful for other research

purposes. This section describes some of the most useful ones based on the developer’s experience.

These reusable scripts include many more others are located in LTR_retriever/bin/

Script: annotate_gff.pl

Description: Annotate the GFF file generated by RepeatMasker using the LTR library generated by

LTR_retriever.

Usage: perl annotate_gff.pl genome_LTRlib.fa genome.gff >

genome.anno.gff

Options: None.

Script: call_seq_by_list.pl

Description: Extract sequence from the user provided genome (FASTA format) using a file containing

a list of coordinates (one line each) in the MSU locus format (e.g., target1 Chr01:10000..11000). The

script can output sequence in its minus direction. If the locus coordinate is written backward (e.g.,

Chr1:2000..1000), it would be treated as a negative strand request.

Usage: perl call_seq_by_list.pl MSU_format_list -C genome.fa [options]

Options:

itself (default) extract the sequence specified by the coordinate

up_[INT] extract the sequence [INT] bp upstream of the specified coordinate

down_[INT] extract the sequence [INT] bp downstream of the specified coordinate

-rmvoid if this is triggered (default off), skip printing any void sequence (e.g., no such

sequence in the genome)

-ex if this is triggered (default off), execute the exclude function instead of the

extract function

-purge [0/1] use with -ex, if this is triggered (1, default 0), sequence specified by the

list file will be excluded from the provided FASTA file

-cov [0-1] use with -ex, sequence with removal length longer than the specified portion

(default 0.7) will be entirely excluded

Script: purger.pl

Description: Purge the provided FASTA file with BLAST alignment output.

Usage: perl purger.pl -blast blast_outfmt6 -seq FASTA [options]

Options:

-eval [FLOAT] for BLAST hits, e-values (e.g. 1e-10) lower than this cutoff (default 0.001)

is considered a real alignment

-len [INT] length of alignment hits (bp, default 90) to be considered as a real alignment

-cov [0-1] if the excluded portion of a sequence exceeds the specified value (default 1),

discard the entire sequence

-purge [0/1] if this is triggered (1, default 1), sequence regions identified by BLAST will

be excluded from the provided FASTA file. If this is not triggered, the entire

sequence will be excluded if it achieves the cutoff defined in “-cov”. Otherwise

the matched portion will be retained.

Script: PacBio_processor.pl

Description: Convert fastq files (e.g., PacBio reads) into FASTA files with simple filtering options.

Usage: perl PacBio_processor.pl PacBio.fastq > PacBio.fasta

Options (modified in the script):

minLength minimal read length (bp) (default 500)

maxLength maximal read length (bp) (default 50000)

minRQ minimal read quality (default 0.8)

Bibliography

1. Ellinghaus D, Kurtz S, Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection

of LTR retrotransposons. BMC bioinformatics. 2008;9(1):18.

2. Xu Z, Wang H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons.

Nucleic acids research. 2007;35(Web Server issue):W265-8.

3. Lee H, Lee M, Mohammed Ismail W, Rho M, Fox GC, Oh S, et al. MGEScan: a Galaxy-based system for

identifying retrotransposons in genomes. Bioinformatics. 2016.

4. Rho M, Choi J-H, Kim S, Lynch M, Tang H. De novo identification of LTR retrotransposons in eukaryotic

genomes. BMC genomics. 2007;8(1):90.

5. Jiang N. Plant Transposable Elements. eLS: John Wiley & Sons, Ltd; 2016.

6. Hirochika H, Sugimoto K, Otsuki Y, Tsugawa H, Kanda M. Retrotransposons of rice involved in

mutations induced by tissue culture. Proceedings of the National Academy of Sciences of the United

States of America. 1996;93(15):7783-8.

7. Kuwahara A, Kato A, Komeda Y. Isolation and characterization of copia-type retrotransposons in

Arabidopsis thaliana. Gene. 2000;244(1-2):127-36.

8. Yin H, Liu J, Xu Y, Liu X, Zhang S, Ma J, et al. TARE1, a mutated Copia-like LTR retrotransposon followed

by recent massive amplification in tomato. PloS one. 2013;8(7):e68587.

9. McCarthy EM, McDonald JF. LTR_STRUC: a novel search and identification program for LTR

retrotransposons. Bioinformatics. 2003;19(3):362-7.

10. Gremme G, Steinbiss S, Kurtz S. GenomeTools: A Comprehensive Software Library for Efficient

Processing of Structured Genome Annotations. IEEE/ACM transactions on computational biology and

bioinformatics / IEEE, ACM. 2013;10(3):645-56.

11. Estill JC, Bennetzen JL. The DAWGPAWS pipeline for the annotation of genes and transposable

elements in plant genomes. Plant Methods. 2009;5(1):1-11.

12. Ma J, Bennetzen JL. Rapid recent growth and divergence of rice nuclear genomes. Proceedings of the

National Academy of Sciences of the United States of America. 2004;101(34):12404-10.

Manual

Navigation menu

Versions of this User Manual:

Views

Navigation