User Guide V1.0

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 18

DownloadUser Guide V1.0
Open PDF In BrowserView PDF
GAL
Genome Annotator Light
Version 1.0

User Guide

Authors:
Arijit Panda
Narendrakumar M. Chaudhari
Sucheta Tripathy*
Contact Email: arijpanda@gmail.com and tsucheta@gmail.com
Developed at:
Computational Genomics Lab,
Structural Biology and Bioinformatics Division,
CSIR-Indian Institute of Chemical Biology,
Kolkata, India.

*Principal Investigator

Table of Contents

Introduction ........................................................................... 1
Getting Started ................................................................................................................ 1
System Requirements...................................................................................................... 1
Quick Start ...................................................................................................................... 1
Additional useful Commands ......................................................................................... 2

GAL User Interface (GUI) ...................................................... 5
GAL Homepage .............................................................................................................. 5
GAL Data Upload Options ............................................................................................. 6
GAL Sample Data ........................................................................................................... 7
GALGenome Browser .................................................................................................... 8
Gene Sequence Page ..................................................................................................... 10
BLAST Page ................................................................................................................. 11

Command Line Options ...................................................... 13
How to run GAL in command line mode?.................................................................... 13
Accessing host directory ............................................................................................... 13
Running the programs ................................................................................................... 13
Setting up the configuration file ................................................................................... 13
Data Format .................................................................................................................. 14
Sample organism data upload using command line mode: ........................................... 15
List of Reference Genomes ........................................................................................... 15

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

Introduction
GAL is a software package for analyzing and visualizing a genome or a group of
genomes. GAL is implemented inside Docker. Docker technology is becoming
popular throughout the bioinformatics community due to its features, ease with
dependencies and more efficient usage of the underlying system and resources.
Docker allows deploying an application in a sandbox (called container) to run on
the host operating system locally. Docker needs to be installed on the host
system (Linux in this case) to proceed with GAL.

Getting Started
GAL can be installed and initiated through Docker. Docker is available in two
editions: Community Edition (CE) and Enterprise Edition (EE). Docker CE and
EE are available on multiple platforms, on cloud and on-premises.


Docker website: https://www.docker.com/



Docker Documentation for beginners: https://docker-curriculum.com/



Docker CE and EE are available
at:https://docs.docker.com/engine/installation/#supported-platforms

System Requirements
GAL can be installed on the following operating systems:


CentOS 7.1/7.2 & RHEL 7.0/7.1/7.2/7.3 (YUM-based systems)



Ubuntu 16.04 LTS or higher

Quick Start
1. GAL can be downloaded and installed using following docker command:
docker pull rjit17/gal:1.0
In 100 Mbps, network speed the entire package download takes
approximately 8 minutes.
For upcoming versions,‘1.0’ should be replaced with respective version.

GAL User Guide

1

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization
2. To run GAL use the following command:
docker run -it -p 8080:80 rjit17/gal:1.0
This will initiate GAL at port 8080 of local server or localhost. User may
use another port to initiate another instance
[To manipulate Docker utilities refer to Docker Documentation]
3. While the GAL instance is running inside Docker container, GAL User
Interface (UI) can be accessed through a web browser at following URL:
http://localhost:port/
In this case, it is
http://localhost:8080/
It can also be:
http://:8080
4. GAL can now be used to upload your data through the browser.

Additional useful Commands
List docker images
To find the pulled docker images in the system user can use the following
commands:
docker images
This will list images as follows,

GAL User Guide

2

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization
Set instance name
Docker by default allocates a random name and id for the running
instance. User can change the instance name by adding ‘–-name’ option
in the command line. It will help the user to track an instance later.
Example:
docker run --name=test -it -p 8080:80 rjit17/gal:1.0
Here ‘test’ is the name of the running instance.
Find docker instances
To find all the available docker instances use the following commands
docker ps –a
This is the output example of the above command.

Exit docker instance
To exit from a running docker instance use `exit` command.
To exit from docker command line, use CTRL+p followed by CTRL+q
Re-enter running instance
To re-enter into a running instance, use the following command
docker exec –it bash
Example:
docker exec –ittest bash
Here ‘test’ is the name of the running instance.

GAL User Guide

3

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization
Restart Docker instance
To start the stooped instances, use the following command:
docker start -i 
Example:
docker start -i test
Here ‘test’ is the name of the running instance.

Successful GAL Start
On successfuldocker GAL instance start, the following message will
appear.

[ OK ] indicates successful initiation.

GAL User Guide

4

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

GAL User Interface (GUI)
GAL GUI is must for data visualization,and it includes several web pages
like,

GAL Homepage



GUI for GAL can be loaded inside a web browser for Genome Upload,
Genome Browsing; downstream analyses like Blast Searches, Annotation
Query and Sequence Retrieval along with analyses of all the annotated
proteins using various EMBOSS tools.



The Homepage will list the genomes only after they are processed. Until
then there will be no data available in the list view or tree view.
It approximately took28 minutes to process ~5 Mb E.coli genome for
Genbank Annotation as input on standard Ubuntu Desktop having 4 CPUs
and 4 Gb of RAM. The same genome at various annotation levels took
proportionate time. e.g. Product Annotation (31 minutes), Minimal
Annotation (30 minutes), and No Annotation (175 minutes using
GeneMark annotator + NCBI BLAST).

GAL User Guide

5

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization


The Navigation panel to the left will help the user to access various
features like:
o Genome Upload: Upload options at any stage of the annotation
process.
o QUERY: Gene search using gene name, primary annotation,
genomic locus or HMMPFAM/ Signalp/ tmhmm annotations.
o BLAST: Sequence search using NCBI BLAST for protein or gene
sequence within the uploaded dataset.
o Help: Help and documentation.

GAL Data Upload Options

The user can provide data in four ways, viz. type1: Genbank Annotation, type2:
Only Genome Fasta files, type3: Genome fasta and gff files; type 4: Genome
Fasta, gff files and product files

GAL User Guide

6

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization






Genbank Annotation: This allows data input through NCBI annotated
Genbank file (GBFF).
Product Annotation: This allows genome FASTA, GFF (genome feature
file) and product information file.
Minimal Annotation: This allows the basic annotation information
provided by the user where userprovides genome FASTA (FNA) file and
GFF file.
No Annotation: This allows data through only genome FASTA (FNA) file
with annotation options using AUGUSTUS or Genmark for eukaryotic and
prokaryotic genomes using related reference genomes, respectively.

GAL Sample Data

Clicking the genome name will direct the browser to Genome Summary Page for
respective organism where organism details and links to the Scaffold wise
Genome browser links are provided.

GAL User Guide

7

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization
From genome browser, each coding and non-coding regions can be visualized in
details with exon-intron boundaries along with sequence download links and
analysis options.

GAL Genome Browser
GAL Genome browser can visualize coding and non-coding regions in selected
locus range of selected genome, as shown in the following image.
SINGLE GENOME BROWSER MODE

MULTI GENOME BROWSER MODE

GAL User Guide

8

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

Additionally, GAL can automatically visualize respective regions from multiple
taxonomically related species (if present in given dataset) based on LastZ
Alignments.
Each highlighted region links to the individual gene details page with annotation
details, gene analysis options and sequence download options.
.

Gene Details Page
All the annotated genes, transcript or proteins can be analyzed separately into
Gene details page,
Exon Intron Boundaries for transcripts:

Annotation summary tables for various methods are also displayedon the same
page for more details.
EMBOSS TOOLKIT
The protein analysis supported by various EMBOSS tools is available at each
gene details page. The outputs can be visualizedon the same page by just
clicking the name of the package. All the outputs can be downloaded as image or
text format wherever suitable.

GAL User Guide

9

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

The above screenshot shows various EMBOSS tools incorporated into the GAL
analysis. The example output for the given transcript by plotorf tool is shown
here. All the adjacent tabs with the name of these tools can generate the
standard outputs. These tools include banana, cpgplot, eprimer32, sixpack,
showpep, tfscan etc.

Gene Sequence Page
The gene sequence page provides the option for retrieving nucleotide sequences
of the genomic region as well as protein sequence of the translated gene. The
green highlighted sequence indicates the exons for easy understanding and
reporting.

GAL User Guide

10

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

BLAST Page
As the genomes are available in the database after processing the genomes
uploaded by the user, any nucleotide or protein sequences can be BLASTed
against the available genomes. The selection of any of the genomes or all the is
possible from the checkboxes near organism names. The genomes are shown
as tree view for the blast options.

GAL User Guide

11

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

The screenshot of the BLAST page showing variousoption for sequence input
and parameter as well as genome selection.

GAL User Guide

12

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

Command Line Options
How to run GAL in command line mode?
GAL can easilybe run from a web browser. Optionally, for users familiar with
Docker command line and Ubuntu Terminal can run GAL through command
line.

Accessing host directory
The host directory can be accessed through the following command:
docker run -it –v [host_directory_path]:[GAL_file
system_path] -p 8080:80 rjit17/gal:[GAL version]
Example:
docker run -it -v /home/arijit/test:/usr/GAL_data -p
8080:80 rjit17/gal:1.0
After running the above command, the host operating directory will be
available to the GAL file system. In that way user can process data
from the host directory. Now you will enter to GAL container.
root@container_id:/#

Running the programs
GAL is based on Python. Python 3.4 or above is required to use GAL.
The main program for GAL is main.py present at: /usr/GAL
path.
To run the GAL control script use following command:
python3 /usr/GAL/main.py --orgconfig=[config_file_path]

Setting up the configuration file
User needs to provide configuration file in INI format.
INI format:
[section]
name=value

GAL User Guide

13

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

Structure of the organism configuration file:
[OrganismDetails]
Organism:
version:
source_url:
[SequenceType]
SequenceType:
[AnnotationInfo]
Blastp:
signalp:
pfam:
tmhmm:
[filePath]
GenBank:
FASTA:
GFF:
Product:
LastZ:
SignalP:
pfam:
TMHMM:
Interproscan:
[other]
Program:
ReferenceGenome:

Sample configuration file is present at:/usr/GAL/config/organism_config_format.ini

Data Format
We have defined input data type in four ways,
Data
type
Type1
Type2
Type3
Type4

Name
Genbank Annotation
No Annotation
Minimal Annotation
Product Annotation

GAL User Guide

Input files
Genbank Sequence File
Genome Fasta File
Genome Fasta File, GFF file
Genome Fasta File, GFF File, Product file

14

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization

Sample organism data upload using command line mode:
Data

Commands to upload Sample genomes

Type1
Type2
Type3
Type4

python3
python3
python3
python3

/usr/GAL/main.py
/usr/GAL/main.py
/usr/GAL/main.py
/usr/GAL/main.py

--orgconfig=/usr/GAL/SampleFiles/type1.Ini
--orgconfig=/usr/GAL/SampleFiles/type2.Ini
--orgconfig=/usr/GAL/SampleFiles/type3.Ini
--orgconfig=/usr/GAL/SampleFiles/type4.Ini

List of Reference Genomes
AUGUSTUS Reference Genomes
Organism code for configuration file

Organism Name
Animals
Aedes aegypti
Amphimedon queenslandica
Acyrthosiphon pisum
Brugia malayi
Caenorhabditis elegans
Drosophila melanogaster
Homo sapiens
Nasonia vitripennis
Tribolium castaneum
Trichinella spiralis
Alveolata
Tetrahymena thermophila
Toxoplasma gondii
Plants and Algae
Arabidopsis thaliana
Galdieria sulphuraria
Solanum lycopersicum
Zea mays
Fungi
Aspergillus fumigatus
Aspergillus nidulans
Aspergillus oryzae
Aspergillus terreus
Botrytis cinerea
Candida albicans
Candida guilliermondii
Candida tropicalis
Chaetomium globosum

GAL User Guide

aedes
amphimedon
pea_aphid
brugia
caenorhabditis
fly
human
nasonia
tribolium
trichinella
tetrahymena
toxoplasma
arabidopsis
galdieria
tomato
maize
aspergillus_fumigatus
aspergillus_nidulans
aspergillus_oryzae
aspergillus_terreus
botrytis_cinerea
candida_albicans
candida_guilliermondii
candida_tropicalis
chaetomium_globosum

15

GAL: An Integrated Virtual Machine for Genome Analysis and Visualization
Organism Name
Coccidioides immitis
Coprinus cinereus
Cryptococcus neoformans
Debaryomyces hansenii
Encephalitozoon cuniculi
Eremothecium gossypii
Fusarium graminearum
Histoplasma capsulatum
Kluyveromyces lactis
Laccaria bicolor
Lodderomyces elongisporus
Magnaporthe grisea
Neurospora crassa
Phanerochaete chrysosporium
Pichia stipitis
Rhizopus oryzae
Saccharomyces cerevisiae
Schizosaccharomyces pombe
Ustilago maydis
Yarrowia lipolytica

Organism Name

Organism code for configuration file
coccidioides_immitis
coprinus
cryptococcus_neoformans_neoformans_B
debaryomyces_hansenii
encephalitozoon_cuniculi_GB
eremothecium_gossypii
fusarium_graminearum
histoplasma_capsulatum
kluyveromyces_lactis
laccaria_bicolor
lodderomyces_elongisporus
magnaporthe_grisea
neurospora_crassa
phanerochaete_chrysosporium
pichia_stipitis
rhizopus_oryzae
saccharomyces_cerevisiae_S288C
schizosaccharomyces_pombe
ustilago_maydis
yarrowia_lipolytica

GeneMark Reference Genomes
Organism code for configuration file

Vibrio fischeri ES114
Azotobacter vinelandii DJ
Bacillus subtilis subsp. subtilis str. 168
Escherichia coli str. K-12 substr. MG1655
Mycoplasma genitalium G37
Pseudomonas fluorescens SBW25
Synechocystis sp. PCC 6803

Aliivibrio_fischeri_hmm.mod
Azotobacter_vinelandii_hmm.mod
Bacillus_subtilis_hmm.mod
Escherichia_coli_hmm.mod
Mycoplasma_genitalium_hmm.mod
Pseudomonas_fluorescens_hmm.mod
Synechocystis_sp._PCC_6803_hmm.mod

END OF DOCUMENT

GAL User Guide

16



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 18
Language                        : en-US
Tagged PDF                      : Yes
Author                          : Whites
Creator                         : Microsoft® Word 2010
Create Date                     : 2018:04:04 16:01:34+05:30
Modify Date                     : 2018:04:04 16:01:34+05:30
Producer                        : Microsoft® Word 2010
EXIF Metadata provided by EXIF.tools

Navigation menu