Episo User Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 6
Download | |
Open PDF In Browser | View PDF |
Episo-User Guide-V1.0 1) Quick Reference Episo needs a working version of Perl and it is run from the command line. Meanwhile, Bowtie, Tophat and Cufflinks need to be installed on your computer. First you need to download a transcript annotation file from the Ensembl or NCBI websites. Episo supports the reference trancriptom sequence files in FastA format, allowed file extensions are either .fa or .fasta. The following examples will use the paired-ends files ‘example_1.fastq&example_2.fastq’ (it contains 2,500,876 reads in FastQ format, 101 bp long reads, simulated by Fluxsimulator) and the transcript annotation file. (1) Compiling the program When you use the UNIX (linux, Mac OSX) you should compile some programs. You can use gcc or any ANSI C-compatible compiler. The source codes are from my Episo package. The commands are as follow. gcc -o contrans contrans.c gcc -o compare-paired compare-paired.c -lm gcc -o anti-bisulfite anti-bisulfite.c gcc -o selsam selsam.c gcc -o methylation_ratio methylation_ratio.c –lm gcc -o isofrom_filter isoform_filter.c (2) Generating reference transcriptome The first step is to generate the transcript file according to annotation transcript by using the program Cufflinks. The command is as follows. Usage: cufflinks -G annotation.gtf convert.sam Note. The file annotation.gtf is from the Ensembl or NCBI websites and the file convert.sam is from the Episo package. This will produce one output file: transcripts.gtf. The second step is to generate the reference transcriptome by using the program contrans. The command is as follows. Usage: contrans contrans.ctl Note. The format of control file contrans.ctl is as follows. The genome file genome.fa is from the Ensembl or NCBI websites. outfile = out gtffile = transcripts.gtf fafile = genome.fa transfile = out_trans seqfile = out_seq.fa seqlength = 50 * recording the running information * gtf file generated by cufflinks * genome file * recording the transcript * recording the sequence of each trascript * the length of sequence in the seqfile This will produce two output files whose names are from the parameter “seqfile” and “transfile” in the control file contrans.ctl. (3) Generating transcriptome indexing Usage: bismark_genome_preparation [options]Note. The output seqfile generated by the program contrans or reference trancriptome sequence file downloaded from the Ensembl or NCBI websites should be put in the . A typical trancriptome indexing could be like this: bismark_genome_preparation --path_to_bowtie /usr/local/bowtie --verbose /data/transcriptome/ (4) Calling the methylation site Usage: bismark-liu [options] -1 -2 A typical calling example could be like this: bismark-liu --path_to_bowtie /usr/local/bowtie --vanilla --sam -n 2 /data/transcriptome/ -1 example_1.fastq -2 example_2.fastq This will produce three output files: (a) example_1.fastq_bismark_pe.txt (contains all alignments and methylation call strings) (b) example_1.fastq_bismark_pe_mul.txt (contains the transcript information, where the alignment belongs) (c) example_1.fastq_bismark_PE_report.txt (contains alignment and methylation summary) Note. The options “vanilla” and “sam” are necessary and the bowtie version must be bowtie1. The program compare-paired and the transfile generated by contrans must be in the same directory in which bismark-liu is. (5) Generating the anti-bisulfite RNA-Seq data Usage: anti-bisulfite anti-bisulfite.ctl Note. The format of control file anti-bisulfite.ctl is as follows. outfile = out * recording the running information intxtfile = example_1.fastq_bismark_pe.txt * bismark_pe.txt file generated by bismark-liu outreadfile = anti-example * the index of anti-bisulfite RNA-Seq fastq file flag = p * p means paired-ends; s means singled-end skipped_number = 1 * the number of the rows which will be skipped inmultxtfile = example_1.fastq_bismark_pe_mul.txt * bismark_pe_mul.txt file generated by bismark-liu This will produce three output files according to the control file anti-bisulfite.ctl: (a) anti-example_1.fastq and anti-example_2.fastq (an anti-bisulfite RAN-Seq paired-end file) (b) methylation_summary (contains the transcript information, where the methylation alignment belongs) (6) Estimating the methylation level of each isoform The first step is to analysis the anti-example_1.fastq and anti-example_2.fastq by using the program TopHat. The options “--bowtie1” and “--no-convert-bam” must be chosen. The second step is to generate the file which contains the methylation alignments only according to the sam file generated by TopHat. The command is as follows. Usage: selsam <.sam> Note. The .sam file was generated by TopHat. The skipped_number is the number of rows which include the sign “@” in the .sam file. The output file name is accepted_hits_methylation.sam. The third step is to use the program Cufflinks to analysis the sam file which was generated by TopHat and the file accepted_hits_methylation.sam generated by selsam respectively. The option “-G” must be chosen and the gtf file comes from the contrans.ctl file. The last step is to estimate the methylation level of each transcript. The command is as follows. Usage: methylation_ratio Note. The files in and are got by using the program isoform_filter according to the files in and . The file in is the output file isoforms.fpkm_tracking when using the Cufflinks to analysis the sam file which was generated by TopHat. The file in is the output file isoforms.fpkm_tracking when using the Cufflinks to analysis the file accepted_hits_sam generated by selsam. The number in is the number of rows of the file ***_bismark_pe_mul.txt generated by bismark-liu. The number in is the number of rows of the file methylation_summary generated by anti-bisulfite. The filtering command is as follows. Usage: isoform_filter Note. The number in is FPKM value under which the records in the files and are deleted. 2) Estimating the methylation level of each transcript at a single site After generating the anti-bisulfite RNA-Seq data, we can estimate the methylation level of each transcript at a single site. In order to estimate the methylation level of each transcript at a single site, we need the output files generated by TopHat&selsam according to the output files generated by anti-bisulfite. The pipeline is as follows. (1) Compiling the program gcc -o selreads selreads.c -lm gcc -o selsam-single-parallel selsam-single-parallel.c (2) Outputting the alignments which include the assigned single site Usage: selreads selreads.ctl Note. The format of control file selreads.ctl is as follows. The value of parameter “length” is the length of read which in the fastq file generated by anti-bisulfite. outfile = out * recording the running information intxtfile = example_1.fastq_bismark_pe.txt * bismark_pe.txt file generated by bismark-liu intransfile = out_trans * trans file generated by contrans outreadfile = sel_reads * the index of reads file in which the reads include assigned methylated site location = 59 * the methylated site location flag = p * p means paired-ends; s means singled-end length = 75 * the length of read chrom_name = test_chromosome * the name for chromosome skipped_number = 1 * the number of the rows which will be skipped This will produce one output file: methylation_summary_sam (contains the names of alignments which include the assigned single site in the control file selreads.ctl) (3) Generating the sam files which are used to be analysed by Cufflinks Usage: selsam-single-parallel 1
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : Yes Author : ljf Comments : Company : Create Date : 2018:06:05 14:49:47+08:00 Modify Date : 2018:06:05 14:49:50+08:00 Source Modified : D:20180605064840 Subject : Tagged PDF : Yes XMP Toolkit : Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03 Metadata Date : 2018:06:05 14:49:50+08:00 Creator Tool : Acrobat PDFMaker 11 Word 版 Document ID : uuid:8b7944d2-4da7-4209-8940-46f8c8f9d709 Instance ID : uuid:46181238-e255-472d-b7a2-5029abc0e40c Format : application/pdf Title : Description : Creator : ljf Producer : Adobe PDF Library 11.0 Keywords : Page Layout : OneColumn Page Count : 6EXIF Metadata provided by EXIF.tools