User Guide V1.0
User Manual:
Open the PDF directly: View PDF .
Page Count: 18
Download | |
Open PDF In Browser | View PDF |
GAL Genome Annotator Light Version 1.0 User Guide Authors: Arijit Panda Narendrakumar M. Chaudhari Sucheta Tripathy* Contact Email: arijpanda@gmail.com and tsucheta@gmail.com Developed at: Computational Genomics Lab, Structural Biology and Bioinformatics Division, CSIR-Indian Institute of Chemical Biology, Kolkata, India. *Principal Investigator Table of Contents Introduction ........................................................................... 1 Getting Started ................................................................................................................ 1 System Requirements...................................................................................................... 1 Quick Start ...................................................................................................................... 1 Additional useful Commands ......................................................................................... 2 GAL User Interface (GUI) ...................................................... 5 GAL Homepage .............................................................................................................. 5 GAL Data Upload Options ............................................................................................. 6 GAL Sample Data ........................................................................................................... 7 GALGenome Browser .................................................................................................... 8 Gene Sequence Page ..................................................................................................... 10 BLAST Page ................................................................................................................. 11 Command Line Options ...................................................... 13 How to run GAL in command line mode?.................................................................... 13 Accessing host directory ............................................................................................... 13 Running the programs ................................................................................................... 13 Setting up the configuration file ................................................................................... 13 Data Format .................................................................................................................. 14 Sample organism data upload using command line mode: ........................................... 15 List of Reference Genomes ........................................................................................... 15 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Introduction GAL is a software package for analyzing and visualizing a genome or a group of genomes. GAL is implemented inside Docker. Docker technology is becoming popular throughout the bioinformatics community due to its features, ease with dependencies and more efficient usage of the underlying system and resources. Docker allows deploying an application in a sandbox (called container) to run on the host operating system locally. Docker needs to be installed on the host system (Linux in this case) to proceed with GAL. Getting Started GAL can be installed and initiated through Docker. Docker is available in two editions: Community Edition (CE) and Enterprise Edition (EE). Docker CE and EE are available on multiple platforms, on cloud and on-premises. Docker website: https://www.docker.com/ Docker Documentation for beginners: https://docker-curriculum.com/ Docker CE and EE are available at:https://docs.docker.com/engine/installation/#supported-platforms System Requirements GAL can be installed on the following operating systems: CentOS 7.1/7.2 & RHEL 7.0/7.1/7.2/7.3 (YUM-based systems) Ubuntu 16.04 LTS or higher Quick Start 1. GAL can be downloaded and installed using following docker command: docker pull rjit17/gal:1.0 In 100 Mbps, network speed the entire package download takes approximately 8 minutes. For upcoming versions,‘1.0’ should be replaced with respective version. GAL User Guide 1 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization 2. To run GAL use the following command: docker run -it -p 8080:80 rjit17/gal:1.0 This will initiate GAL at port 8080 of local server or localhost. User may use another port to initiate another instance [To manipulate Docker utilities refer to Docker Documentation] 3. While the GAL instance is running inside Docker container, GAL User Interface (UI) can be accessed through a web browser at following URL: http://localhost:port/ In this case, it is http://localhost:8080/ It can also be: http://:8080 4. GAL can now be used to upload your data through the browser. Additional useful Commands List docker images To find the pulled docker images in the system user can use the following commands: docker images This will list images as follows, GAL User Guide 2 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Set instance name Docker by default allocates a random name and id for the running instance. User can change the instance name by adding ‘–-name’ option in the command line. It will help the user to track an instance later. Example: docker run --name=test -it -p 8080:80 rjit17/gal:1.0 Here ‘test’ is the name of the running instance. Find docker instances To find all the available docker instances use the following commands docker ps –a This is the output example of the above command. Exit docker instance To exit from a running docker instance use `exit` command. To exit from docker command line, use CTRL+p followed by CTRL+q Re-enter running instance To re-enter into a running instance, use the following command docker exec –it bash Example: docker exec –ittest bash Here ‘test’ is the name of the running instance. GAL User Guide 3 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Restart Docker instance To start the stooped instances, use the following command: docker start -i Example: docker start -i test Here ‘test’ is the name of the running instance. Successful GAL Start On successfuldocker GAL instance start, the following message will appear. [ OK ] indicates successful initiation. GAL User Guide 4 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization GAL User Interface (GUI) GAL GUI is must for data visualization,and it includes several web pages like, GAL Homepage GUI for GAL can be loaded inside a web browser for Genome Upload, Genome Browsing; downstream analyses like Blast Searches, Annotation Query and Sequence Retrieval along with analyses of all the annotated proteins using various EMBOSS tools. The Homepage will list the genomes only after they are processed. Until then there will be no data available in the list view or tree view. It approximately took28 minutes to process ~5 Mb E.coli genome for Genbank Annotation as input on standard Ubuntu Desktop having 4 CPUs and 4 Gb of RAM. The same genome at various annotation levels took proportionate time. e.g. Product Annotation (31 minutes), Minimal Annotation (30 minutes), and No Annotation (175 minutes using GeneMark annotator + NCBI BLAST). GAL User Guide 5 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization The Navigation panel to the left will help the user to access various features like: o Genome Upload: Upload options at any stage of the annotation process. o QUERY: Gene search using gene name, primary annotation, genomic locus or HMMPFAM/ Signalp/ tmhmm annotations. o BLAST: Sequence search using NCBI BLAST for protein or gene sequence within the uploaded dataset. o Help: Help and documentation. GAL Data Upload Options The user can provide data in four ways, viz. type1: Genbank Annotation, type2: Only Genome Fasta files, type3: Genome fasta and gff files; type 4: Genome Fasta, gff files and product files GAL User Guide 6 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Genbank Annotation: This allows data input through NCBI annotated Genbank file (GBFF). Product Annotation: This allows genome FASTA, GFF (genome feature file) and product information file. Minimal Annotation: This allows the basic annotation information provided by the user where userprovides genome FASTA (FNA) file and GFF file. No Annotation: This allows data through only genome FASTA (FNA) file with annotation options using AUGUSTUS or Genmark for eukaryotic and prokaryotic genomes using related reference genomes, respectively. GAL Sample Data Clicking the genome name will direct the browser to Genome Summary Page for respective organism where organism details and links to the Scaffold wise Genome browser links are provided. GAL User Guide 7 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization From genome browser, each coding and non-coding regions can be visualized in details with exon-intron boundaries along with sequence download links and analysis options. GAL Genome Browser GAL Genome browser can visualize coding and non-coding regions in selected locus range of selected genome, as shown in the following image. SINGLE GENOME BROWSER MODE MULTI GENOME BROWSER MODE GAL User Guide 8 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Additionally, GAL can automatically visualize respective regions from multiple taxonomically related species (if present in given dataset) based on LastZ Alignments. Each highlighted region links to the individual gene details page with annotation details, gene analysis options and sequence download options. . Gene Details Page All the annotated genes, transcript or proteins can be analyzed separately into Gene details page, Exon Intron Boundaries for transcripts: Annotation summary tables for various methods are also displayedon the same page for more details. EMBOSS TOOLKIT The protein analysis supported by various EMBOSS tools is available at each gene details page. The outputs can be visualizedon the same page by just clicking the name of the package. All the outputs can be downloaded as image or text format wherever suitable. GAL User Guide 9 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization The above screenshot shows various EMBOSS tools incorporated into the GAL analysis. The example output for the given transcript by plotorf tool is shown here. All the adjacent tabs with the name of these tools can generate the standard outputs. These tools include banana, cpgplot, eprimer32, sixpack, showpep, tfscan etc. Gene Sequence Page The gene sequence page provides the option for retrieving nucleotide sequences of the genomic region as well as protein sequence of the translated gene. The green highlighted sequence indicates the exons for easy understanding and reporting. GAL User Guide 10 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization BLAST Page As the genomes are available in the database after processing the genomes uploaded by the user, any nucleotide or protein sequences can be BLASTed against the available genomes. The selection of any of the genomes or all the is possible from the checkboxes near organism names. The genomes are shown as tree view for the blast options. GAL User Guide 11 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization The screenshot of the BLAST page showing variousoption for sequence input and parameter as well as genome selection. GAL User Guide 12 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Command Line Options How to run GAL in command line mode? GAL can easilybe run from a web browser. Optionally, for users familiar with Docker command line and Ubuntu Terminal can run GAL through command line. Accessing host directory The host directory can be accessed through the following command: docker run -it –v [host_directory_path]:[GAL_file system_path] -p 8080:80 rjit17/gal:[GAL version] Example: docker run -it -v /home/arijit/test:/usr/GAL_data -p 8080:80 rjit17/gal:1.0 After running the above command, the host operating directory will be available to the GAL file system. In that way user can process data from the host directory. Now you will enter to GAL container. root@container_id:/# Running the programs GAL is based on Python. Python 3.4 or above is required to use GAL. The main program for GAL is main.py present at: /usr/GAL path. To run the GAL control script use following command: python3 /usr/GAL/main.py --orgconfig=[config_file_path] Setting up the configuration file User needs to provide configuration file in INI format. INI format: [section] name=value GAL User Guide 13 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Structure of the organism configuration file: [OrganismDetails] Organism: version: source_url: [SequenceType] SequenceType: [AnnotationInfo] Blastp: signalp: pfam: tmhmm: [filePath] GenBank: FASTA: GFF: Product: LastZ: SignalP: pfam: TMHMM: Interproscan: [other] Program: ReferenceGenome: Sample configuration file is present at:/usr/GAL/config/organism_config_format.ini Data Format We have defined input data type in four ways, Data type Type1 Type2 Type3 Type4 Name Genbank Annotation No Annotation Minimal Annotation Product Annotation GAL User Guide Input files Genbank Sequence File Genome Fasta File Genome Fasta File, GFF file Genome Fasta File, GFF File, Product file 14 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Sample organism data upload using command line mode: Data Commands to upload Sample genomes Type1 Type2 Type3 Type4 python3 python3 python3 python3 /usr/GAL/main.py /usr/GAL/main.py /usr/GAL/main.py /usr/GAL/main.py --orgconfig=/usr/GAL/SampleFiles/type1.Ini --orgconfig=/usr/GAL/SampleFiles/type2.Ini --orgconfig=/usr/GAL/SampleFiles/type3.Ini --orgconfig=/usr/GAL/SampleFiles/type4.Ini List of Reference Genomes AUGUSTUS Reference Genomes Organism code for configuration file Organism Name Animals Aedes aegypti Amphimedon queenslandica Acyrthosiphon pisum Brugia malayi Caenorhabditis elegans Drosophila melanogaster Homo sapiens Nasonia vitripennis Tribolium castaneum Trichinella spiralis Alveolata Tetrahymena thermophila Toxoplasma gondii Plants and Algae Arabidopsis thaliana Galdieria sulphuraria Solanum lycopersicum Zea mays Fungi Aspergillus fumigatus Aspergillus nidulans Aspergillus oryzae Aspergillus terreus Botrytis cinerea Candida albicans Candida guilliermondii Candida tropicalis Chaetomium globosum GAL User Guide aedes amphimedon pea_aphid brugia caenorhabditis fly human nasonia tribolium trichinella tetrahymena toxoplasma arabidopsis galdieria tomato maize aspergillus_fumigatus aspergillus_nidulans aspergillus_oryzae aspergillus_terreus botrytis_cinerea candida_albicans candida_guilliermondii candida_tropicalis chaetomium_globosum 15 GAL: An Integrated Virtual Machine for Genome Analysis and Visualization Organism Name Coccidioides immitis Coprinus cinereus Cryptococcus neoformans Debaryomyces hansenii Encephalitozoon cuniculi Eremothecium gossypii Fusarium graminearum Histoplasma capsulatum Kluyveromyces lactis Laccaria bicolor Lodderomyces elongisporus Magnaporthe grisea Neurospora crassa Phanerochaete chrysosporium Pichia stipitis Rhizopus oryzae Saccharomyces cerevisiae Schizosaccharomyces pombe Ustilago maydis Yarrowia lipolytica Organism Name Organism code for configuration file coccidioides_immitis coprinus cryptococcus_neoformans_neoformans_B debaryomyces_hansenii encephalitozoon_cuniculi_GB eremothecium_gossypii fusarium_graminearum histoplasma_capsulatum kluyveromyces_lactis laccaria_bicolor lodderomyces_elongisporus magnaporthe_grisea neurospora_crassa phanerochaete_chrysosporium pichia_stipitis rhizopus_oryzae saccharomyces_cerevisiae_S288C schizosaccharomyces_pombe ustilago_maydis yarrowia_lipolytica GeneMark Reference Genomes Organism code for configuration file Vibrio fischeri ES114 Azotobacter vinelandii DJ Bacillus subtilis subsp. subtilis str. 168 Escherichia coli str. K-12 substr. MG1655 Mycoplasma genitalium G37 Pseudomonas fluorescens SBW25 Synechocystis sp. PCC 6803 Aliivibrio_fischeri_hmm.mod Azotobacter_vinelandii_hmm.mod Bacillus_subtilis_hmm.mod Escherichia_coli_hmm.mod Mycoplasma_genitalium_hmm.mod Pseudomonas_fluorescens_hmm.mod Synechocystis_sp._PCC_6803_hmm.mod END OF DOCUMENT GAL User Guide 16
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 18 Language : en-US Tagged PDF : Yes Author : Whites Creator : Microsoft® Word 2010 Create Date : 2018:04:04 16:01:34+05:30 Modify Date : 2018:04:04 16:01:34+05:30 Producer : Microsoft® Word 2010EXIF Metadata provided by EXIF.tools