GUAVA Manual

GUAVA_manual

User Manual: Pdf

Open the PDF directly: View PDF PDF.
Page Count: 13

DownloadGUAVA Manual
Open PDF In BrowserView PDF
GUAVA Manual
Mayur Divate and Edwin Cheung

1

Index

Quick start

4

Download Software

4

How to open Terminal?

4

Install

5

Install dependencies

5

#1 Install R

5

#2 Install other dependencies and R packages

5

#3 Install MACS2

6

How to start GUAVA?

6

Graphical user interface of GUAVA

6

ATAC-seq data analysis program: Parameters

7

Output interface for GUAVA ATAC-seq data analysis

8

ATAC-seq differential analysis program: parameters

10

Output interface for GUAVA ATAC-seq differential analysis

11

Getting help and reporting issues

11

Download genome fasta file

13

How to create a bowtie index of genome fasta file

13

2

GUAVA: a GUI tool for the Analysis and Visualization of
ATAC-seq data
In nutshell, GUAVA is a standalone GUI tool for processing, analyzing and visualizing
ATAC-seq data. A user can start GUAVA analysis with raw reads to identify ATACseq signals. Then ATAC-seq signals from two or more samples can be compared
using GUAVA to identify genomic loci with differentially enriched ATAC-seq signals.
Furthermore, GUAVA also provides gene ontology and pathways enrichment
analysis. Since to use GUAVA requires only several clicks and no learning curve, it
will help novice bioinformatics researchers and biologist with minimal computer skills
to analyze ATAC-seq data. Therefore, we believe that GUAVA is a powerful and time
saving tool for ATAC-seq data analysis. GUAVA setup contains a script to configure
and install dependencies which facilitates the GUAVA installation. GUAVA works on
Linux and Mac OS.
This document contains all the information that is required to install and use GUAVA.
GUAVA is developed in the Edwin’s laboratory at University of Macau.

3

Quick Start
Download Software
The GUAVA tool is provided as a Java jar file. You can download the latest release
as a zipped GUAVA package from project page on GitHub. The file name will be
“GUAVA-master.zip”. And the source code is available at project source code page
on GitHub.

Figure1: GUAVA - GitHub repository
Go to GUAVA project page on GitHub. Click on the ‘clone or download’ to view option
for downloading GUAVA package ZIP file.

How to open Terminal?
MAC
1. Click on the Finder icon located in your dock
2. Click on the Utilities folder
3. Double-click on the Terminal icon
Linux
1. Press windows key on keyboard
2. Type “terminal” in the search box
3. Click on the Terminal icon
OR press “ Ctrl + Alt + T ” this will open then terminal

4

Install
Open the downloaded GUAVA package and place the folder containing the jar file in
a home directory / folder on your hard drive. It can be placed in any desired folder.
But later in this tutorial it is assumed that it is in home directory. It can be achieved
by the use following command on the terminal.
cp /path/to/GUAVA-master.zip ~/
Once package is copied to home directory, use command below to unzip and rename
it.
cd ~/
unzip GUAVA-master.zip
mv GUAVA-master GUAVA
Then close the terminal

Install dependencies
GUAVA depends on other tools in order to process ATAC-seq data (e.g. bowtie for
alignment). If any of the dependency is not found on system, GUAVA will fail to start.
To help users, we have written a program (configure.sh) which automatically
downloads and installs dependencies. However, the user need to install R and
MACS2 manually.

#1 Install R
MAC
1. Download R follow this link => https://cran.r-project.org/bin/macosx/
2. Click on the R-X.X.X.pkg file link (e.g. R-3.4.3.pkg)
3. Double click on the downloaded file and follow the instructions

Linux
1. Open the terminal
2. Type command ` sudo apt-get install r-base ` and press enter
To know more about it, open the following link
https://cran.r-project.org/bin/linux/
Choose appropriate Linux OS type.

#2 Install other dependencies and R packages
1. Open the terminal
2. Use following commands to run configure.sh.
cd ~/GUAVA
sh ./configure.sh
3. Close the terminal
5

#3 Install MACS2
4. Open the terminal
5. And use following commands
cd ~/GUAVA
python get-pip.py
pip install MACS2
6. Close the terminal

How to start GUAVA?
1. Open the terminal.
2. Use following commands to start GUAVA
cd ~/GUAVA
java -jar GUAVA.jar
NOTE: If you face difficulties in installing GUAVA please report the issue here
https://github.com/MayurDivate/GUAVASourceCode/issues.

Graphical user interface of GUAVA
We demonstrate how to use the GUAVA graphical user interface and show typical
results that are obtained from the program by using the GSE84515 ATAC-seq
dataset.
GUAVA tool has two main programs
1) ATAC-seq data analysis: to process raw ATAC-seq sequencing reads
2) ATAC-seq differential analysis: to compare ATAC-seq signals.
When GUAVA GUI is evoked it open GUAVA home window (Figure 2A). Here you can
choose between above two programs. Then based on the selection of program, the
desire input window will be opened (Figure 2 B and C).

6

A

C

B

D

Figure 2. Design of GUAVA Graphical user interface
(A) GUAVA home windows: allows user to choose between available GUAVA
program. Once the user has chosen desired program, it opens the input interface for
that program. Using input interface user can upload input files such as fastq, bam
etc. and set parameters (B) Input window interface of ATAC-seq data analysis
program and (C and D) ATAC-seq differential analysis.

ATAC-seq data analysis program
This program accepts raw ATAC-seq reads as an input. Before aligning reads to
genome, it trims adapter sequence from reads using cutadapt only if trimming option
is selected. After that it filters unsuitable reads for ATAC-seq analysis such as
duplicate reads. Next, it uses MACS2 to identify ATAC-seq peaks. Finally, it performs
functional annotation on the ATAC-seq peaks.

Parameters
R1 fastq: button to select and upload R1 fastq file ATAC-seq reads
R2 fastq: button to select and upload R2 fastq file ATAC-seq reads
Trim adapter: check this option if reads contains adapter
Maximum Ns: if one of the read in pair contains more than specified number Ns after adapter
trimming, that read pair will be discarded (default 2)
Minimum read length: if one of the read in pair is shorter than specified length after adapter
trimming, that read pair will be discarded (default 30)

7

Error Rate: allowed number of mismatches as a fraction of adapter sequence length. For
example, if error rate is 0.1 then 1 mismatch is allowed for 10bp match of adapter sequence
(default 0.1)
Nextera XT adapter: you can select this option if adapter used for ATAC-seq is Nextera XT
adapter (default true)
Adapter sequence: option to specify custom adapter sequence when Nextera XT adapter is
not used for library preparation.
Bowtie V1 or Bowtie V2 index: If you want to use bowtie for read mapping select “Bowtie
index” from drop down menu else select “Bowtie2 index” to use bowtie2. Then using browse
button upload appropriate genome index (bowtie or bowtie2 index). Please see section ‘how
to create genome index’ to know more about genome index. (default bowtie)
Maximum insert size: Maximum insert size in base pair allowed for paired end alignment
(default 2000)
Maximum genomic hits or Mapping quality: Maximum genomic hit (bowtie) and Minimum
Mapping quality (bowtie2) to discard reads pairs which has multiple alignments (default
Maximum genomic hits =1 and Mapping quality >= 10)
Genome assembly: select the correct genome build from drop down menu e.g. hg19 and
same build will be used for peak annotation and functional analysis.
ChrM: if selected, reads aligning to mitochondrial chromosome will be discarded (default
true)
ChrY: if selected, reads aligning to chromosome Y will be discarded. (default false)
RAM: RAM in GB to be used by GUAVA (default 1)
CPU units: number of CPU units to be used by GUAVA (default 1)
p or q value: select appropriate value from drop down menu and specify the cut off value in
box next to it. This will be used by MACS2 to filter peaks (default q value)
Output folder: select folder to save GUAVA ATAC-seq data analysis results
Reset All: button to set all parameters to default value. select folder to save GUAVA ATACseq data analysis results
Start Analysis: click this button to start ‘ATAC-seq data analysis’ program. If all provided
options are valid then GUAVA will start analysis.

Output interface for GUAVA ATAC-seq data analysis
Once GUAVA finishes analysis it shows results on tabular output interface (figure 3).
Also facilitates the visualization of ATAC-seq signal on IGV browser.

8

A

B

C

D

E

F

G

H

Figure 3: Output interface for GUAVA ATAC-seq data analysis. A) Input summary and
alignment statistics. B) Read filtering and peak calling summary. C) Peak annotation table with
sorting and filtering functionality. Easy access to IGV for visualizing peaks and automatically
generated normalized ATAC-seq signal by GUAVA. D) Visualization of ATAC-seq peaks with
IGV. E) Graph showing the fragment size distribution. F) Pie chart showing the percentage of
peaks in various genomic locations such as promoter, intron, exon, UTR, etc. G) Plot showing
9

the percentage of the peaks upstream and downstream of the TSS of the nearest genes.
Different colors indicate different ranges of distances from the TSS. H) Enriched pathways
obtained using ReactomePA bioconductor package.
The output interface of ‘ATAC-seq data analysis’ program has following five tabs.
1) ‘Alignment statistics’ tab: This tab provides reads mapping statistics (e.g. total
number of reads mapped to genome along) with summary of input files and
parameters (Figure 3A).
2) ‘Alignment Filtering’ tab: It has two tables (Figure 3B). One is to provide figures
for various types reads (e.g. useful reads, which are nothing but reads that have
passed all the filtering criteria and eligible for the downstream analysis). On the
other hand, second table shows summary of MACS2 peak calling.
3) ‘Annotated Peaks’ tab: This tab provides complete list of ATAC-seq peaks along
with annotations such as distance from nearest gene, gene symbol of nearest
gene and overlapping genomic feature e.g exon, intron etc. (Figure 3C). The
search box is provided at bottom can be used to search peaks (Figure 3C). To
view only the list of peaks annotated with a particular gene, type the symbol of
that gene in the search box. To visualize peak in the IGV browser, select a peak
and then click on the ‘view in IGV’ button at the next search box. This will open a
new IGV browser instance and ATAC-seq signals will be loaded automatically on
the browser (Figure 3D).
4) ‘Fragment size distribution’ tab: This tab displays the fragment size distribution
plot for a given ATAC-seq sample (Figure 3E).
5) ‘Plots’ tab: It has three sub tabs one for the pie chart showing distribution of
peaks in the several genomic features (Figure 3H), another for plot that shows
proportion of the peaks upstream and downstream of the TSS of the nearest gene
(Figure 3F), and the last tab provides top enriched pathways (Figure 3G).
Furthermore, these results are stored in the output folder, click the ‘output folder’
button at the bottom-right to open the output folder.

ATAC-seq differential analysis program
This program compares ATAC-seq signals from two conditions and returns the
differentially enriched signals. Additionally, it provides the peak annotation and
functional analysis for differentially enriched peaks. There are two input windows for
this program. First window is to upload the ATAC-seq signals from different and
conditions and replicates (Figure 1C). Use ‘add file’ and ‘remove’ buttons to add and
delete input files respectively. Once you have uploaded bed file containing ATAC-seq
peaks and bam files, specify the condition and replicate number for each file and click
‘Next’. Second window allows you to specify differential analysis related parameters
e.g. fold change (Figure 1B). Once you have added all the required files and
parameters click ‘Start’ button to run differential analysis.

Parameters
Analysis method: currently we have only implemented DESeq2.
log2 (Fold Change): log2 fold change cut off to define differentially enriched peaks. Default 2.
P value: P value cut off to select most significant differentially enriched peaks. Default 0.05.
10

Upstream of TSS: if the peak is present within a specified distance (in base pair) from the TSS
of a gene, to the upstream. Then that gene will be associated with the peak for functional
analysis. Default 25000.
Downstream of TSS: if the peak is present within a specified distance (in base pair) from the
TSS of a gene, to the downstream. Then that gene will be associated with the peak for
functional analysis. Default 10000.
Output folder: select folder to save GUAVA differential analysis results.

Output interface for GUAVA ATAC-seq differential analysis
The output interface of ‘ATAC-seq differential analysis’ program is also tabular like
‘ATAC-seq data analysis’ program.
1) ‘Summary’ tab: This tab provides summary of input parameters e.g. fold
change cut-off, list of input files used for differential analysis (Figure 4A) etc.
2) ‘Differential Table’ tab: This provides the list of differentially enriched ATACseq signals with annotation such as nearest gene to peak and the distance
between them (Figure 4B) etc. Same as output interface of ‘ATAC-seq data
analysis’ program, there is a search box and ‘view in IGV’ button at bottom of
window. Which can be used to sort peaks by gene symbol and view peaks in
IGV from input samples, respectively (Figure 4D).
3) ‘Plot’ tab: This provides volcano plot of differentially enriched peaks (Figure
4C).
4) ‘Go Analysis’ and 5) ‘Pathway Analysis’ tabs: These tabs provide results of
functional analysis i.e. enriched gene ontologies (Figure 4E) and pathways
(Figure 4F) respectively.

11

A

B

C

D

E

F

Figure 4: Output interface for GUAVA ATAC-seq differential analysis. A) Input summary. B)
Differentially enriched peaks with sorting and filtering functionality. Easy access to IGV to
visualize differentially enriched peaks and normalized ATAC-seq signals from each sample. C)
Volcano plot indicating the differentially enriched peaks. Red: peaks with increased chromatin
accessibility, green: peaks with reduced chromatin accessibility and black: peaks with no
significant change in chromatin accessibility. D) Peak visualization in IGV. E) Enriched gene
ontologies and F) enriched pathways.

Getting help and reporting issues
If user seeks help any issue that is not covered in this manual, he can first search on
GitHub for help (https://github.com/MayurDivate/GUAVASourceCode/issues). If he
12

does not find it on GitHub he can start new issue. Similarly, user can find any bug, he
can also report that on the same page.

Download genome fasta file
Fasta is a text file format for representing nucleotide or protein sequences. Genome
fasta file is a fasta file which contains the nucleotide sequences from all of the
chromosomes of a particular organism. Genome fasta file is a required for read
mapping using any aligner tool. Those users who don’t know where they can find
genome fasta file please follow the links given below and subsequent instructions,
Human: http://hgdownload.soe.ucsc.edu/downloads.html#human
Mouse: http://hgdownload.soe.ucsc.edu/downloads.html#mouse
Then, click on the ‘full data set’. This will open a new page, scroll down and click on
the chromFa.tar.gz to download genome sequence.
Use following command to extract chromosome files and merge them into one file.
tar -zxvf -d /path/to/chromFa.tar.gz
cat chromFa/*fa > GenomeBuild.fasta

How to create a bowtie index of genome fasta file
It is true that the genome fasta file is required for the alignment. But the aligners use
special set of files called as genome index, generated using from genome fasta. Index
files are used to speed up the read mapping process so that the aligner can map
millions of reads within few hours of time. Therefore, you need create genome index
file before read mapping. Remember that the index format is different for each aligner.
Please refer to ‘Download genome fasta file’ section to find more information about
downloading genome fasta file. If you already have a genome fasta file, follow the
commands below to create a bowtie genome index.
To create bowtie index:
bowtie-build /path/to/GenomeBuild.fa GenomeBuild.fa
To create bowtie2 index:
bowtie2-build /path/to/GenomeBuild.fa GenomeBuild.fa
For example, suppose the genome fasta file is in ‘genomes’ directory and which is sub directory
of ‘database’ directory under the home directory then command to create bowtie index will be as
follows

bowtie-build ~/database/genomes/hg19.fa hg19.fa
Note: This is a time-consuming step.

13



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
Linearized                      : No
Page Count                      : 13
PDF Version                     : 1.4
Title                           : Microsoft Word - GUAVA_manual_Plos_v2.docx
Producer                        : Mac OS X 10.13.3 Quartz PDFContext
Creator                         : Word
Create Date                     : 2018:02:26 07:14:36Z
Modify Date                     : 2018:02:26 07:14:36Z
EXIF Metadata provided by EXIF.tools

Navigation menu