GUAVA Manual

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 32

DownloadGUAVA Manual
Open PDF In BrowserView PDF
GUAVA Manual
Mayur Divate and Edwin Cheung

Version 1
Released on May 1, 2018

1

Table of Content
Description

Page

1.

About GUAVA

3

2.

How to download GUAVA

4

3.

How to get help and report bugs

5

4.

How to open Terminal

6

5.

Installation of GUAVA

7

5.1

Installing dependencies for GUAVA

7

5.1.1.

Installing R

7

5.1.2.

Installing other dependencies

8

5.1.3.

Installing MACS2

8

6.

How to start GUAVA

9

7.

The graphical user interface of GUAVA

10

7.1

ATAC-seq data analysis program GUI

10

7.1.1.

ATAC-seq data analysis program parameters

11

7.2

Output interface for GUAVA ATAC-seq data analysis

13

7.3

ATAC-seq differential analysis program GUI

21

7.3.1.

ATAC-seq differential analysis program parameters

22

7.4.

Output interface of GUAVA ATAC-seq differential analysis

23

8.

How to download a genome fasta file

31

9.

How to create an index of genome fasta file

32

2

1. About GUAVA
GUAVA: a GUI tool for the Analysis and Visualization of ATAC-seq
data
GUAVA is a standalone GUI tool for the processing, analysis, and visualization of
ATAC-seq data from raw sequencing reads to ATAC-seq signals. GUAVA can
compare ATAC-seq signals from two conditions to identify genomic loci with
differentially enriched ATAC-seq signals. Furthermore, GUAVA provides results on
gene ontology and pathways analysis. Since using GUAVA requires only several
clicks and no learning curve, it will help novice bioinformatics researchers and
biologist with minimal computer skills to analyze ATAC-seq data. Therefore, we
believe that GUAVA is a powerful and time saving tool for ATAC-seq data analysis.
The GUAVA setup contains a script to configure and install dependencies which
facilitates the GUAVA installation. GUAVA works on Linux and Mac OS.
This document contains all the information that is required to install and use GUAVA.
GUAVA was developed in Edwin Cheung’s laboratory at the University of Macau.

3

2. How to download GUAVA
GUAVA is hosted on GitHub and can be downloaded to your computer by performing
the following steps.
Step 1: Go to the link: https://github.com/MayurDivate/GUAVA/releases.
Step 2: Click on ‘Source code (zip)’.
Step 3: This will save the GUAVA zip package in your computer’s downloads
folder.

Figure1. GUAVA - GitHub project page. Above picture shows the download page
for GUAVA at GitHub. Users click on the ‘Source code (zip)’ to download the GUAVA
package.
If you would like to download the source code for GUAVA, use the link below.
https://github.com/MayurDivate/GUAVASourceCode.

4

3. How to get help and report bugs
Sometimes users face difficulties in installing or using bioinformatic tools. Therefore,
for bioinformatic tools, it is very important to have an active forum where users can
report issues and share information with authors and other users. Thus, GUAVA also
has a forum at GitHub for this and below is the link to for it,
https://github.com/MayurDivate/GUAVA/issues.

5

4. How to open Terminal
After downloading GUAVA, users will need to open the Terminal to install it. Here, we
describe the procedure to open the Terminal for non-bioinformatics users.
MAC
1. Open Finder
2. Click on ‘Go’ in the menu bar and then select the ‘Utilities’ folder
3. Double-click on the Terminal icon
Linux
1. Press the ‘windows’ key on the keyboard
2. Type “Terminal” in the search box
3. Click on the Terminal icon
OR press “Ctrl + Alt + T” at the same time which will open Terminal

6

5. Installation of GUAVA
GUAVA can be installed in the home folder. Before proceeding, first copy/move the
downloaded GUAVA package to the home folder and unzip the package. If the
downloaded package is in the folder ‘Downloads’, then type the following commands
in Terminal to unzip the package.

mv ~/Downloads/GUAVA-1.zip ~/
cd ~/
unzip GUAVA-1.zip
NOTE: If the downloaded GUAVA package is in a different folder than ‘Downloads’, you will
have to use the complete path of that folder instead of ~/Downloads/GUAVA-master.zip. To
copy the path, simply copy the downloaded package and paste it in the Terminal.

5.1 Installing dependencies for GUAVA
GUAVA depends on other tools in order to process ATAC-seq data (e.g. Bowtie for
alignment). If any of the dependencies are not found on the system, GUAVA will not
work properly. Therefore, to help users to install the dependencies, we have written
a program (configure.sh) which automatically downloads and installs the
dependencies. However, users need to install R and MACS2 manually due to
technical reasons.

5.1.1 Installing R
MAC
1. To download R, follow this link => https://cran.r-project.org/bin/macosx/.
2. Click on the R-X.X.X.pkg file link (e.g. R-3.4.3.pkg).
3. Double-click on the downloaded file and follow the instructions.
Linux
1. Open Terminal.
2. Type the command, ‘sudo apt-get install r-base’ and then press enter.
Note: To know more about R, follow the link https://cran.r-project.org/bin/linux/. Choose the appropriate Linux
OS type.

7

5.1.2 Installing other dependencies
To run configure.sh, use the following commands in Terminal.
cd ~/GUAVA
sh ./configure.sh
Note: This may take a while to finish. Also, you will need to press ‘enter’ several times to continue.
Additionally, answer all questions with ‘yes’

5.1.3 Installing MACS2
To install the last dependency, MACS2, use the following commands in Terminal.
cd ~/GUAVA
python get-pip.py
pip install MACS2
Error: Sometime MACS2 fails to install Numpy. In such a situation run `pip install numpy` first,
and then try to install MACS2.
NOTE: If you see the error message ‘permission denied’, type 'sudo' at the beginning of the
commands (e.g. sudo python get-pip.py). Then, in order to proceed you will have to enter your
password.

This is the end of the installation. GUAVA is now ready to process ATAC-seq data.

8

6. How to start GUAVA
After successfully installing GUAVA, users can start GUAVA by using the following
commands in Terminal. This will open the home window of GUAVA where users can
choose the program they want to use.
cd ~/GUAVA
java -jar GUAVA.jar

Figure 2. GUAVA–home window. Above interface opens at start of GUAVA. Which
allows user to select a desired GUAVA program.

9

7. The graphical user interface of GUAVA
As shown in the Figure 2, GUAVA tool consists of the following three programs:
1) ATAC-seq Data Analysis: to process raw ATAC-seq sequencing reads.
2) ATAC-seq Differential Analysis: to compare ATAC-seq signals.
3) Genome Index Builder: to create the Bowtie or Bowtie2 index of genome
When the GUAVA GUI is evoked, it will open the GUAVA home window (Figure 2).
Users can choose one of the above programs to proceed further. Based on the
selection of the program, the desired input window will open (Figures 3, 13 and 23).

7.1 ATAC-seq data analysis program GUI
The ATAC-seq data analysis program accepts raw ATAC-seq reads as an input.
Before aligning reads to genome, the program trims adapter sequences from reads
using cutadapt if the trimming option has been selected. In addition, it filters
unsuitable reads such as duplicate reads. Users can also exclude certain
chromosomes from the analysis using the ‘Show Chromosomes’ button (Figure 4).
The program uses MACS2 to identify ATAC-seq peaks. Finally, it performs functional
annotation of the ATAC-seq peaks.

Figure 3. Input form of GUAVA ATAC-seq data analysis program. Input window
of ATAC-seq data analysis program to upload input files such as fastq, genome
index, and set parameters such as insert size, p/q value, etc.

10

Figure 4. Interface to select chromosomes for alignment filtering. Here, users
can add the desired chromosomes to the ‘To be removed’ list to discard reads
aligning to those chromosomes.

7.1.1 ATAC-seq data analysis program parameters
Below is a complete list of the buttons and parameters present in the input interface
of the ATAC-seq data analysis program together with a description of their usage.
R1 fastq and R2 fastq: Buttons to upload read1 and read2 fastq files of ATAC-seq data.
Trim adapter: check this option if reads contain adapter.
Maximum Ns: If any read contains more than the specified number of Ns after the adapter
trimming, that read pair will be discarded (default 2).
Minimum Read Length: If any read is shorter than the specified length after adapter
trimming, that read pair will be discarded (default 30).
Error Rate: The allowed number of mismatches as a fraction of length. For example, if the
error rate is 0.1 then 1 mismatch is allowed for a 10 bp match of adapter sequence (default
0.1).
Nextera XT Adapter: Users can select this option if the adapter used for ATAC-seq is a
Nextera XT adapter (default adapter).

11

Adapter sequence: An option to specify the custom adapter sequence when Nextera XT
adapter was not used for library preparation.
Bowtie V1 or Bowtie V2 index: If you want to use Bowtie for read mapping select “Bowtie
index” from the dropdown menu or select “Bowtie2 index” to use Bowtie2. Then, using the
‘browse’ button upload the appropriate genome index file (Bowtie or Bowtie2 index).
Please see the section ‘how to create genome index’ to know more about the genome index
and the genome index builder tool.
Maximum insert size: The maximum insert size in base pair that is allowed for a paired end
alignment (default is 2,000 bp).
Maximum genomic hits or Minimum Mapping Quality: The maximum genomic hit
(Bowtie) and Minimum Mapping Quality (Bowtie2) to discard reads pairs which have
multiple alignments. The default maximum genomic hits =1 and the mapping quality = 30.
Higher mapping quality gives more unique mapping for reads.
Genome assembly: Select the correct genome build from the dropdown menu (e.g. hg19)
for genome assembly which will also be used for peak annotation and functional analysis.
Show chromosomes: A button to exclude reads mapping to specific chromosomes such as
the mitochondrial chromosome. After clicking this button, it will open a new window
(Figure 4) where users can select the desired chromosome(s) that will be excluded.
RAM: RAM in GB to be used by GUAVA (default 1).
CPU units: Number of CPU units to be used by GUAVA (default 1).
p or q value: Select the appropriate value from the dropdown menu and specify the cut off
value in the box next to it. This will be used by MACS2 to filter peaks (default q value).
Output folder: The folder where GUAVA ATAC-seq data analysis results are saved.
Reset All: The button to set all parameters to the default value.
Start Analysis: Clicking this button will start the ‘ATAC-seq data analysis’ program if all of
the provided options are valid.

12

7.2 Output interface for ATAC-seq data analysis program
Once GUAVA has finished the analysis, it will show the results as a tabular output
interface (Figure 5-12). GUAVA also facilitates the visualization of ATAC-seq signals
on the IGV browser.
7.2.1. Alignment Statistics
This tab provides the reads mapping statistics (e.g. the total number of reads mapped
and not mapped to the genome) along with the summary of the input files and
parameters.

Figure 5. Input summary and alignment statistics.

13

7.2.2.

Alignment Filtering
This tab contains two tables: 1) the alignment filtering statistics (duplicates, useful
reads, etc), and 2) a summary of the MACS2 peak calling results.

Figure 6. Read filtering and peak calling summary.

14

7.2.3.

Annotated Peaks
This window shows the ATAC-seq annotated peaks. It contains information on peak
location, nearest gene, and distance to TSS. This window also provides easy access
to IGV for visualizing peaks and automatically generated normalized ATAC-seq
signals by GUAVA. Users can search their peak of interest by typing the symbol of
the nearest gene in the search box at the bottom.

Figure 7. A table containing peak annotation information.

15

7.2.4.

Visualization of ATAC-seq peaks using the Integrated Genome
Viewer (IGV)
Users can visualize their peak of interest from the table (Figure 7) by selecting it and
then clicking the ‘View in IGV’ button. This will automatically load the normalized
ATAC-seq signals and peaks to the IGV browser as shown below.

Figure 8. Visualization of ATAC-seq peaks with IGV.

16

7.2.5.

Fragment Size Distribution
This plot shows the observed fragment size distribution for an ATAC-seq sample.

Figure 9. Graph showing the fragment size distribution.

17

7.2.6.

Plot.
This tab contains a bar chart which illustrates the distribution of the annotated peaks
in various genomic locations such as the promoter, intron, exon, UTR, etc.

Figure 10: Bar chart showing the distribution of annotated ATAC-seq peaks
in various genomic regions.

18

7.2.7.

Gene Ontologies
This tab shows the list of the over-represented gene ontologies associated with
ATAC-seq peaks.

Figure 11: Over-represented gene ontology terms associated with ATAC-seq
peaks.

19

7.2.8.

Pathways
This tab shows the list of the over-represented KEGG pathways associated with
ATAC-seq peaks.

Figure 12: Over represented KEGG pathways associated with ATAC-seq
peaks.

NOTE: The above results are stored in the output folder. To access the output folder,
click on the ‘Output Folder’ button located at the bottom-right corner.

20

7.3 ATAC-seq differential analysis program
The ATAC-seq differential analysis program compares ATAC-seq signals from two
different conditions. It provides users results for the differentially enriched signals, as
well as the peak annotation and functional analysis for the differentially enriched
peaks. There are two input windows for this program. The first window is to upload
the ATAC-seq signals for the two different conditions and replicates (Figures 13 and
14). The second window allows users to specify the differential analysis related
parameters such as fold change and p value. (Figure 14).
When user select the ATAC-seq differential analysis (i.e. the second option) in the
home window (Figure 2), the following input window is displayed (Figure 13). Below
describes how to use the ATAC-seq differential analysis program.
Step 1: Load input files for the differential analysis.
Use the ‘add file’ and ‘remove’ buttons to add and delete input files, respectively.
After adding input files, select the appropriate condition and replicate number from
the drop-down menu. Then, click ‘Next’ to specify differential analysis parameters.

Figure 13. GUAVA ATAC-seq differential analysis input interface 1.
Step 2: Set differential analysis parameters.
Choose the appropriate genome build (e.g. hg19), log2 (fold change) cut off, p value,
number of CPUs, TSS to peak upstream and downstream distance cut off, and the
output folder as shown in Figure 14. Once all these parameters have been entered,
users can start the differential analysis.

21

Figure 14. GUAVA ATAC-seq differential analysis input interface 2.

7.3.1 ATAC-seq differential analysis program parameters
Below is the complete list of buttons and parameters present in the input interface
(Figure 13 and 14) of the ATAC-seq differential analysis program together with a
description of their usage.
Analysis method: select method for differential analysis DESeq2.
log2 (Fold Change): The log2 fold change cut off to define differentially enriched peaks. The
default is 2.
P value: The p value cut off to select the most significant differentially enriched peaks. The
default is 0.05.
Upstream of TSS: If a peak is present within a specified distance (in base pair) upstream
from the TSS of a gene, then that gene will be associated with the peak for functional
analysis. The default is 5000 bp.
Downstream of TSS: If a peak is present within a specified distance (in base pair)
downstream from the TSS of a gene, then that gene will be associated with the peak for
functional analysis. The default is 3000 bp.
Output folder: The folder where GUAVA differential analysis results are saved.

22

7.4 Output interface of ATAC-seq differential analysis program
Once differential analysis has finished, GUAVA will show the results as a tabular
output interface (Figures 15-22). GUAVA also facilitates the visualization of ATAC-seq
signals on the IGV browser.
7.4.1. Summary
This tab provides a summary of the input parameters and files used to run differential
analysis.

Figure 15:. Input summary. Summary of the parameters and input files used in the
differential analysis.

23

7.4.2. Volcano Plot
This graph shows the summary of differential analysis. The red and green colors
indicate peaks with reduced and increased chromatin accessibility, respectively.

Figure 16. A volcano plot showing the differential ATAC-seq signals.

24

7.4.3. PCA Plot
This graph shows the principal component analysis (PCA) of the samples used in the
differential analysis.

Figure 17. A PCA plot illustrating the variance between the control and treated
samples.

25

7.4.4. Annotated Peaks
This tab provides a table with differentially enriched peaks and easy access to
visualize ATAC-seq signals from control and treatment samples.

Figure 18. Annotation of the differentially enriched ATAC-seq signals.
Differentially enriched peaks associated with a particular gene can be searched by
typing the gene symbol in the search box provided at the bottom.

26

7.4.5. Visualization of ATAC-seq signals from the control and treatment
samples using IGV
To visualize ATAC-seq signals from the control and treatment samples, select a
differentially enriched peak from the ‘Annotated Peaks’ tab and then click on the
‘View in IGV’ button. This will automatically load normalized signals from the all the
input samples on the IGV browser.

Figure 19. ATAC-seq signal visualization using IGV.

27

7.4.6. Bar Chart
This tab contains a bar chart which illustrates the distribution of the annotated of peaks in
various genomic locations such as the promoter, intron, exon, UTR, etc.

Figure 20. Bar chart showing the distribution of the differentially enriched
ATAC-seq peaks in various genomic regions such as promoters, introns,
exons, etc.

28

7.4.7. Gene Ontologies
This tab shows the list of the over-represented gene ontologies associated with
differentially enriched peaks.

Figure 21. Over represented gene ontology terms associated with differentially
enriched peaks.

29

7.4.8. Pathway
This tab shows the list of the over-represented KEGG pathways associated with
differentially enriched peaks.

Figure 22. Over represented KEGG pathways associated with differentially
enriched peaks.

NOTE: The above results are stored in the output folder. To access output folder,
click on the ‘Output Folder’ button located at the bottom-left corner.

30

8. How to download a genome fasta file
Fasta is a flat file format for representing nucleotide or protein sequences. The
genome fasta file is a fasta file that contains the nucleotide sequences from all of the
chromosomes of a particular organism. The genome fasta file is required for mapping
reads with any sequence aligner tool. For users who do not know where to find the
genome fasta file, they can use the UCSC link below and choose the desired
organism to download the fasta file.
Step 1.

Go to the link: http://hgdownload.soe.ucsc.edu/downloads.html.

Step 2.

Click on the desired organism (e.g. human).

Step 3.

Click on the ‘Full data set’ under the appropriate genome build (e.g.
hg19).

Step 4.

Scroll down and then click on the chromFa.tar.gz to download genome
sequence.

Step 5.

Open Terminal.

Step 6.

Type the following commands to extract the chromosome files and merge
them into a single file.

tar -zxvf -d /path/to/chromFa.tar.gz
cat chromFa/*fa > GenomeBuild.fasta

That’s it, your genome fasta file is ready.

31

9. How to create an index of genome fasta file
Read aligners use a special set of files called the genome index that is generated
from the genome fasta file. An index files are used to speed up the read mapping
process so that the aligner can map millions of reads within a few hours. Therefore,
users need to create a genome index before read mapping. Please note that the
genome index format is different for each aligner. Please also refer to the ‘Download
genome fasta file’ section to find more information about downloading the genome
fasta file. If there is already a genome fasta file, then use the Genome Index Builder
program to create genome index. To run this program, user need select a aligner,
genome fasta file and output folder. Below are the stepwise instructions to use
‘Genome Index Builder’ program,

Step 1. Choose the ‘Genome Index Builder’ program from the home window
(Figure 2).
Step 2. Select the appropriate aligner (Bowtie or Bowtie2).
Step 3. Click on the ‘Genome Fasta’ button to load the genome fasta file.
Step 4. Click on the ‘output folder’ button to select the folder to store the index
files.
Step 5. Click on the ‘Start’ button to start program.

Figure 23. GUAVA Genome Index Builder program. This is a interface for the
‘Genome Index Builder’ program. Which allows users to select aligner, genome fasta
file, and output folder. Finally, users click on the ‘Start’ button to run the program.

32



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Page Count                      : 32
PDF Version                     : 1.4
Title                           : Microsoft Word - GUAVA_manual_Frontiers_v4_pdf.docx
Producer                        : Mac OS X 10.13.3 Quartz PDFContext
Creator                         : Word
Create Date                     : 2018:05:01 10:32:27Z
Modify Date                     : 2018:05:01 10:32:27Z
EXIF Metadata provided by EXIF.tools

Navigation menu