GUAVA Manual
User Manual: Pdf
Open the PDF directly: View PDF
.
Page Count: 32
| Download | |
| Open PDF In Browser | View PDF |
GUAVA Manual Mayur Divate and Edwin Cheung Version 1 Released on May 1, 2018 1 Table of Content Description Page 1. About GUAVA 3 2. How to download GUAVA 4 3. How to get help and report bugs 5 4. How to open Terminal 6 5. Installation of GUAVA 7 5.1 Installing dependencies for GUAVA 7 5.1.1. Installing R 7 5.1.2. Installing other dependencies 8 5.1.3. Installing MACS2 8 6. How to start GUAVA 9 7. The graphical user interface of GUAVA 10 7.1 ATAC-seq data analysis program GUI 10 7.1.1. ATAC-seq data analysis program parameters 11 7.2 Output interface for GUAVA ATAC-seq data analysis 13 7.3 ATAC-seq differential analysis program GUI 21 7.3.1. ATAC-seq differential analysis program parameters 22 7.4. Output interface of GUAVA ATAC-seq differential analysis 23 8. How to download a genome fasta file 31 9. How to create an index of genome fasta file 32 2 1. About GUAVA GUAVA: a GUI tool for the Analysis and Visualization of ATAC-seq data GUAVA is a standalone GUI tool for the processing, analysis, and visualization of ATAC-seq data from raw sequencing reads to ATAC-seq signals. GUAVA can compare ATAC-seq signals from two conditions to identify genomic loci with differentially enriched ATAC-seq signals. Furthermore, GUAVA provides results on gene ontology and pathways analysis. Since using GUAVA requires only several clicks and no learning curve, it will help novice bioinformatics researchers and biologist with minimal computer skills to analyze ATAC-seq data. Therefore, we believe that GUAVA is a powerful and time saving tool for ATAC-seq data analysis. The GUAVA setup contains a script to configure and install dependencies which facilitates the GUAVA installation. GUAVA works on Linux and Mac OS. This document contains all the information that is required to install and use GUAVA. GUAVA was developed in Edwin Cheung’s laboratory at the University of Macau. 3 2. How to download GUAVA GUAVA is hosted on GitHub and can be downloaded to your computer by performing the following steps. Step 1: Go to the link: https://github.com/MayurDivate/GUAVA/releases. Step 2: Click on ‘Source code (zip)’. Step 3: This will save the GUAVA zip package in your computer’s downloads folder. Figure1. GUAVA - GitHub project page. Above picture shows the download page for GUAVA at GitHub. Users click on the ‘Source code (zip)’ to download the GUAVA package. If you would like to download the source code for GUAVA, use the link below. https://github.com/MayurDivate/GUAVASourceCode. 4 3. How to get help and report bugs Sometimes users face difficulties in installing or using bioinformatic tools. Therefore, for bioinformatic tools, it is very important to have an active forum where users can report issues and share information with authors and other users. Thus, GUAVA also has a forum at GitHub for this and below is the link to for it, https://github.com/MayurDivate/GUAVA/issues. 5 4. How to open Terminal After downloading GUAVA, users will need to open the Terminal to install it. Here, we describe the procedure to open the Terminal for non-bioinformatics users. MAC 1. Open Finder 2. Click on ‘Go’ in the menu bar and then select the ‘Utilities’ folder 3. Double-click on the Terminal icon Linux 1. Press the ‘windows’ key on the keyboard 2. Type “Terminal” in the search box 3. Click on the Terminal icon OR press “Ctrl + Alt + T” at the same time which will open Terminal 6 5. Installation of GUAVA GUAVA can be installed in the home folder. Before proceeding, first copy/move the downloaded GUAVA package to the home folder and unzip the package. If the downloaded package is in the folder ‘Downloads’, then type the following commands in Terminal to unzip the package. mv ~/Downloads/GUAVA-1.zip ~/ cd ~/ unzip GUAVA-1.zip NOTE: If the downloaded GUAVA package is in a different folder than ‘Downloads’, you will have to use the complete path of that folder instead of ~/Downloads/GUAVA-master.zip. To copy the path, simply copy the downloaded package and paste it in the Terminal. 5.1 Installing dependencies for GUAVA GUAVA depends on other tools in order to process ATAC-seq data (e.g. Bowtie for alignment). If any of the dependencies are not found on the system, GUAVA will not work properly. Therefore, to help users to install the dependencies, we have written a program (configure.sh) which automatically downloads and installs the dependencies. However, users need to install R and MACS2 manually due to technical reasons. 5.1.1 Installing R MAC 1. To download R, follow this link => https://cran.r-project.org/bin/macosx/. 2. Click on the R-X.X.X.pkg file link (e.g. R-3.4.3.pkg). 3. Double-click on the downloaded file and follow the instructions. Linux 1. Open Terminal. 2. Type the command, ‘sudo apt-get install r-base’ and then press enter. Note: To know more about R, follow the link https://cran.r-project.org/bin/linux/. Choose the appropriate Linux OS type. 7 5.1.2 Installing other dependencies To run configure.sh, use the following commands in Terminal. cd ~/GUAVA sh ./configure.sh Note: This may take a while to finish. Also, you will need to press ‘enter’ several times to continue. Additionally, answer all questions with ‘yes’ 5.1.3 Installing MACS2 To install the last dependency, MACS2, use the following commands in Terminal. cd ~/GUAVA python get-pip.py pip install MACS2 Error: Sometime MACS2 fails to install Numpy. In such a situation run `pip install numpy` first, and then try to install MACS2. NOTE: If you see the error message ‘permission denied’, type 'sudo' at the beginning of the commands (e.g. sudo python get-pip.py). Then, in order to proceed you will have to enter your password. This is the end of the installation. GUAVA is now ready to process ATAC-seq data. 8 6. How to start GUAVA After successfully installing GUAVA, users can start GUAVA by using the following commands in Terminal. This will open the home window of GUAVA where users can choose the program they want to use. cd ~/GUAVA java -jar GUAVA.jar Figure 2. GUAVA–home window. Above interface opens at start of GUAVA. Which allows user to select a desired GUAVA program. 9 7. The graphical user interface of GUAVA As shown in the Figure 2, GUAVA tool consists of the following three programs: 1) ATAC-seq Data Analysis: to process raw ATAC-seq sequencing reads. 2) ATAC-seq Differential Analysis: to compare ATAC-seq signals. 3) Genome Index Builder: to create the Bowtie or Bowtie2 index of genome When the GUAVA GUI is evoked, it will open the GUAVA home window (Figure 2). Users can choose one of the above programs to proceed further. Based on the selection of the program, the desired input window will open (Figures 3, 13 and 23). 7.1 ATAC-seq data analysis program GUI The ATAC-seq data analysis program accepts raw ATAC-seq reads as an input. Before aligning reads to genome, the program trims adapter sequences from reads using cutadapt if the trimming option has been selected. In addition, it filters unsuitable reads such as duplicate reads. Users can also exclude certain chromosomes from the analysis using the ‘Show Chromosomes’ button (Figure 4). The program uses MACS2 to identify ATAC-seq peaks. Finally, it performs functional annotation of the ATAC-seq peaks. Figure 3. Input form of GUAVA ATAC-seq data analysis program. Input window of ATAC-seq data analysis program to upload input files such as fastq, genome index, and set parameters such as insert size, p/q value, etc. 10 Figure 4. Interface to select chromosomes for alignment filtering. Here, users can add the desired chromosomes to the ‘To be removed’ list to discard reads aligning to those chromosomes. 7.1.1 ATAC-seq data analysis program parameters Below is a complete list of the buttons and parameters present in the input interface of the ATAC-seq data analysis program together with a description of their usage. R1 fastq and R2 fastq: Buttons to upload read1 and read2 fastq files of ATAC-seq data. Trim adapter: check this option if reads contain adapter. Maximum Ns: If any read contains more than the specified number of Ns after the adapter trimming, that read pair will be discarded (default 2). Minimum Read Length: If any read is shorter than the specified length after adapter trimming, that read pair will be discarded (default 30). Error Rate: The allowed number of mismatches as a fraction of length. For example, if the error rate is 0.1 then 1 mismatch is allowed for a 10 bp match of adapter sequence (default 0.1). Nextera XT Adapter: Users can select this option if the adapter used for ATAC-seq is a Nextera XT adapter (default adapter). 11 Adapter sequence: An option to specify the custom adapter sequence when Nextera XT adapter was not used for library preparation. Bowtie V1 or Bowtie V2 index: If you want to use Bowtie for read mapping select “Bowtie index” from the dropdown menu or select “Bowtie2 index” to use Bowtie2. Then, using the ‘browse’ button upload the appropriate genome index file (Bowtie or Bowtie2 index). Please see the section ‘how to create genome index’ to know more about the genome index and the genome index builder tool. Maximum insert size: The maximum insert size in base pair that is allowed for a paired end alignment (default is 2,000 bp). Maximum genomic hits or Minimum Mapping Quality: The maximum genomic hit (Bowtie) and Minimum Mapping Quality (Bowtie2) to discard reads pairs which have multiple alignments. The default maximum genomic hits =1 and the mapping quality = 30. Higher mapping quality gives more unique mapping for reads. Genome assembly: Select the correct genome build from the dropdown menu (e.g. hg19) for genome assembly which will also be used for peak annotation and functional analysis. Show chromosomes: A button to exclude reads mapping to specific chromosomes such as the mitochondrial chromosome. After clicking this button, it will open a new window (Figure 4) where users can select the desired chromosome(s) that will be excluded. RAM: RAM in GB to be used by GUAVA (default 1). CPU units: Number of CPU units to be used by GUAVA (default 1). p or q value: Select the appropriate value from the dropdown menu and specify the cut off value in the box next to it. This will be used by MACS2 to filter peaks (default q value). Output folder: The folder where GUAVA ATAC-seq data analysis results are saved. Reset All: The button to set all parameters to the default value. Start Analysis: Clicking this button will start the ‘ATAC-seq data analysis’ program if all of the provided options are valid. 12 7.2 Output interface for ATAC-seq data analysis program Once GUAVA has finished the analysis, it will show the results as a tabular output interface (Figure 5-12). GUAVA also facilitates the visualization of ATAC-seq signals on the IGV browser. 7.2.1. Alignment Statistics This tab provides the reads mapping statistics (e.g. the total number of reads mapped and not mapped to the genome) along with the summary of the input files and parameters. Figure 5. Input summary and alignment statistics. 13 7.2.2. Alignment Filtering This tab contains two tables: 1) the alignment filtering statistics (duplicates, useful reads, etc), and 2) a summary of the MACS2 peak calling results. Figure 6. Read filtering and peak calling summary. 14 7.2.3. Annotated Peaks This window shows the ATAC-seq annotated peaks. It contains information on peak location, nearest gene, and distance to TSS. This window also provides easy access to IGV for visualizing peaks and automatically generated normalized ATAC-seq signals by GUAVA. Users can search their peak of interest by typing the symbol of the nearest gene in the search box at the bottom. Figure 7. A table containing peak annotation information. 15 7.2.4. Visualization of ATAC-seq peaks using the Integrated Genome Viewer (IGV) Users can visualize their peak of interest from the table (Figure 7) by selecting it and then clicking the ‘View in IGV’ button. This will automatically load the normalized ATAC-seq signals and peaks to the IGV browser as shown below. Figure 8. Visualization of ATAC-seq peaks with IGV. 16 7.2.5. Fragment Size Distribution This plot shows the observed fragment size distribution for an ATAC-seq sample. Figure 9. Graph showing the fragment size distribution. 17 7.2.6. Plot. This tab contains a bar chart which illustrates the distribution of the annotated peaks in various genomic locations such as the promoter, intron, exon, UTR, etc. Figure 10: Bar chart showing the distribution of annotated ATAC-seq peaks in various genomic regions. 18 7.2.7. Gene Ontologies This tab shows the list of the over-represented gene ontologies associated with ATAC-seq peaks. Figure 11: Over-represented gene ontology terms associated with ATAC-seq peaks. 19 7.2.8. Pathways This tab shows the list of the over-represented KEGG pathways associated with ATAC-seq peaks. Figure 12: Over represented KEGG pathways associated with ATAC-seq peaks. NOTE: The above results are stored in the output folder. To access the output folder, click on the ‘Output Folder’ button located at the bottom-right corner. 20 7.3 ATAC-seq differential analysis program The ATAC-seq differential analysis program compares ATAC-seq signals from two different conditions. It provides users results for the differentially enriched signals, as well as the peak annotation and functional analysis for the differentially enriched peaks. There are two input windows for this program. The first window is to upload the ATAC-seq signals for the two different conditions and replicates (Figures 13 and 14). The second window allows users to specify the differential analysis related parameters such as fold change and p value. (Figure 14). When user select the ATAC-seq differential analysis (i.e. the second option) in the home window (Figure 2), the following input window is displayed (Figure 13). Below describes how to use the ATAC-seq differential analysis program. Step 1: Load input files for the differential analysis. Use the ‘add file’ and ‘remove’ buttons to add and delete input files, respectively. After adding input files, select the appropriate condition and replicate number from the drop-down menu. Then, click ‘Next’ to specify differential analysis parameters. Figure 13. GUAVA ATAC-seq differential analysis input interface 1. Step 2: Set differential analysis parameters. Choose the appropriate genome build (e.g. hg19), log2 (fold change) cut off, p value, number of CPUs, TSS to peak upstream and downstream distance cut off, and the output folder as shown in Figure 14. Once all these parameters have been entered, users can start the differential analysis. 21 Figure 14. GUAVA ATAC-seq differential analysis input interface 2. 7.3.1 ATAC-seq differential analysis program parameters Below is the complete list of buttons and parameters present in the input interface (Figure 13 and 14) of the ATAC-seq differential analysis program together with a description of their usage. Analysis method: select method for differential analysis DESeq2. log2 (Fold Change): The log2 fold change cut off to define differentially enriched peaks. The default is 2. P value: The p value cut off to select the most significant differentially enriched peaks. The default is 0.05. Upstream of TSS: If a peak is present within a specified distance (in base pair) upstream from the TSS of a gene, then that gene will be associated with the peak for functional analysis. The default is 5000 bp. Downstream of TSS: If a peak is present within a specified distance (in base pair) downstream from the TSS of a gene, then that gene will be associated with the peak for functional analysis. The default is 3000 bp. Output folder: The folder where GUAVA differential analysis results are saved. 22 7.4 Output interface of ATAC-seq differential analysis program Once differential analysis has finished, GUAVA will show the results as a tabular output interface (Figures 15-22). GUAVA also facilitates the visualization of ATAC-seq signals on the IGV browser. 7.4.1. Summary This tab provides a summary of the input parameters and files used to run differential analysis. Figure 15:. Input summary. Summary of the parameters and input files used in the differential analysis. 23 7.4.2. Volcano Plot This graph shows the summary of differential analysis. The red and green colors indicate peaks with reduced and increased chromatin accessibility, respectively. Figure 16. A volcano plot showing the differential ATAC-seq signals. 24 7.4.3. PCA Plot This graph shows the principal component analysis (PCA) of the samples used in the differential analysis. Figure 17. A PCA plot illustrating the variance between the control and treated samples. 25 7.4.4. Annotated Peaks This tab provides a table with differentially enriched peaks and easy access to visualize ATAC-seq signals from control and treatment samples. Figure 18. Annotation of the differentially enriched ATAC-seq signals. Differentially enriched peaks associated with a particular gene can be searched by typing the gene symbol in the search box provided at the bottom. 26 7.4.5. Visualization of ATAC-seq signals from the control and treatment samples using IGV To visualize ATAC-seq signals from the control and treatment samples, select a differentially enriched peak from the ‘Annotated Peaks’ tab and then click on the ‘View in IGV’ button. This will automatically load normalized signals from the all the input samples on the IGV browser. Figure 19. ATAC-seq signal visualization using IGV. 27 7.4.6. Bar Chart This tab contains a bar chart which illustrates the distribution of the annotated of peaks in various genomic locations such as the promoter, intron, exon, UTR, etc. Figure 20. Bar chart showing the distribution of the differentially enriched ATAC-seq peaks in various genomic regions such as promoters, introns, exons, etc. 28 7.4.7. Gene Ontologies This tab shows the list of the over-represented gene ontologies associated with differentially enriched peaks. Figure 21. Over represented gene ontology terms associated with differentially enriched peaks. 29 7.4.8. Pathway This tab shows the list of the over-represented KEGG pathways associated with differentially enriched peaks. Figure 22. Over represented KEGG pathways associated with differentially enriched peaks. NOTE: The above results are stored in the output folder. To access output folder, click on the ‘Output Folder’ button located at the bottom-left corner. 30 8. How to download a genome fasta file Fasta is a flat file format for representing nucleotide or protein sequences. The genome fasta file is a fasta file that contains the nucleotide sequences from all of the chromosomes of a particular organism. The genome fasta file is required for mapping reads with any sequence aligner tool. For users who do not know where to find the genome fasta file, they can use the UCSC link below and choose the desired organism to download the fasta file. Step 1. Go to the link: http://hgdownload.soe.ucsc.edu/downloads.html. Step 2. Click on the desired organism (e.g. human). Step 3. Click on the ‘Full data set’ under the appropriate genome build (e.g. hg19). Step 4. Scroll down and then click on the chromFa.tar.gz to download genome sequence. Step 5. Open Terminal. Step 6. Type the following commands to extract the chromosome files and merge them into a single file. tar -zxvf -d /path/to/chromFa.tar.gz cat chromFa/*fa > GenomeBuild.fasta That’s it, your genome fasta file is ready. 31 9. How to create an index of genome fasta file Read aligners use a special set of files called the genome index that is generated from the genome fasta file. An index files are used to speed up the read mapping process so that the aligner can map millions of reads within a few hours. Therefore, users need to create a genome index before read mapping. Please note that the genome index format is different for each aligner. Please also refer to the ‘Download genome fasta file’ section to find more information about downloading the genome fasta file. If there is already a genome fasta file, then use the Genome Index Builder program to create genome index. To run this program, user need select a aligner, genome fasta file and output folder. Below are the stepwise instructions to use ‘Genome Index Builder’ program, Step 1. Choose the ‘Genome Index Builder’ program from the home window (Figure 2). Step 2. Select the appropriate aligner (Bowtie or Bowtie2). Step 3. Click on the ‘Genome Fasta’ button to load the genome fasta file. Step 4. Click on the ‘output folder’ button to select the folder to store the index files. Step 5. Click on the ‘Start’ button to start program. Figure 23. GUAVA Genome Index Builder program. This is a interface for the ‘Genome Index Builder’ program. Which allows users to select aligner, genome fasta file, and output folder. Finally, users click on the ‘Start’ button to run the program. 32
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf Linearized : No Page Count : 32 PDF Version : 1.4 Title : Microsoft Word - GUAVA_manual_Frontiers_v4_pdf.docx Producer : Mac OS X 10.13.3 Quartz PDFContext Creator : Word Create Date : 2018:05:01 10:32:27Z Modify Date : 2018:05:01 10:32:27ZEXIF Metadata provided by EXIF.tools