Attila Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 14

Download
Open PDF In Browser	View PDF

ATTILA User Guide

University of Brasilia
2016

About ATTILA
ATTILA (AutomaTed Tool For Immunoglobulin Analysis) searches for candidate
immunoglobulin sequences in phage display libraries, generating as main output a list of sequences of heavy and light chain, which were selected by phage display experiment, and code
for antibody fragments that can probably bind to the target molecule. ATTILA package
has programs developed in C, Perl and Shell script to execute eight steps of a completely
automated analysis (Figure 1). The third-party tools used by ATTILA are listed in section Requirements.

Figure 1: Analysis performed by ATTILA.
ATTILA can analyse human sequences coding for variable domain of heavy (VH) and
light chain (VL), produced by phage display technology. Therefore, the input for the method
must be VH and VL libraries, from initial and final rounds. Considering that our approach
uses distances between canonical aminoacid residues of variable domain based on human
sequences, the analysis may also be performed on mouse sequences, since their distances are
similar to those of human.
The package has a very simple structure, with two directories, called data and programs,
together with this user guide and an example of configuration file. The directory data
contains human databases from NCBI, used by IgBlast to perform germline classification,
while programs directory keeps all scripts and programs of the method.

1

Requirements
In order to install and run ATTILA, you must have:
• Linux system
• FASTQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
• Prinseq-lite http://prinseq.sourceforge.net/
• FastqJoin https://code.google.com/archive/p/ea-utils/
• Perl https://www.perl.org/get.html
• IgBlast ftp://ftp.ncbi.nih.gov/blast/executables/igblast/release/
• R package https://cran.r-project.org/
• ggplot2 Use R function install.packages(“ggplot2”)
• scales Use R function install.packages(“scales”)
• Internet

2

Installation
After installing all requirements, perform the following steps:
1. Download ATTILA package at ...
2. Uncompress the tar.gz file using command line:
tar -vzxf attila-1.0.tar.gz
3. Go to the directory where you want to install ATTILA, using cd command. Note that
ATTILA and IgBlast packages must be subdirectories of the installation directory.
4. Type the following command line:
ln -s  check_requirements.sh
Example: ln -s home/Attila/programs/check_requirements.sh check_requirements.sh
Remember that check requirements.sh is located in a directory called “programs” of ATTILA package.
5. Run the following command line:
./check_requirements.sh
If check requirements.sh prints “Type ./attilacli.sh to run ATTILA”, then ATTILA is
ready to run ! If not, check requirements.sh prints a list of requirements that still need to be
installed.

3

Getting started
Starting ATTILA
ATTILA uses two configuration files to run the analysis, one for the VH library and one
for the VL library. These files may be manually created or you may let ATTILA do it
for you. In the first case, you can use our example files called SingleEndReads VH.cfg
and SingleEndReads VL.cfg (for single-end reads) or PairedEndReads VH.cfg and
PairedEndReads VL.cfg (for paired-end reads) located in the parent directory of ATTILA
package. Just copy the files and change configurations according to your data. Note that the
parameter called “Project Name” must be the same for both configuration files. The other
way is to create the configuration files is to answer some questions asked by ATTILA. It will
automatically generate the configuration files at the end of the process.
To start ATTILA, type the command line:
./attilacli.sh

Running ATTILA when you already have the configuration files
ATTILA will ask if you already have the configuration files. If you have, type “y”. ATTILA
will ask the settings file path for VH and for VL. Note that the analysis is executed
separately for VH and VL, therefore, you must have two configuration files, one
for each library type.

Using ATTILA to create configuration files
If you prefer to let ATTILA create the configuration files, type “n” or press the ENTER key
when ATTILA asks if configuration files already exist. ATTILA will ask you some questions
in order to fill the parameters for the VH and VL libraries. This step may be time consuming
for first-time users but, at the end, ATTILA will execute the complete analysis of VH and
VL libraries for you. Here is the complete list of parameters asked by ATTILA:
• Project name: Name of the directory that will created by ATTILA to save output files
• Directory to save the project: The directory where the project will be saved
• Reads are paired-end: If yes, type “y” or press ENTER key. If not, type “n”
If your reads are paired-end, ATTILA will ask the location of all eight input files, using
the following parameters:
• VH R0 reads r1 path: location of the fastq file containing reads r1 from initial VH
library
• VH R0 reads r2 path: location of the fastq file containing reads r2 from initial VH
library

4

• VH RN reads r1 path: location of the fastq file containing reads r1 from final VH
library
• VH RN reads r2 path: location of the fastq file containing reads r2 from final VH
library
• VL R0 reads r1 path: location of the fastq file containing reads r1 from initial VL
library
• VL R0 reads r2 path: location of the fastq file containing reads r2 from initial VL
library
• VL RN reads r1 path: location of the fastq file containing reads r1 from final VL
library
• VL RN reads r2 path: location of the fastq file containing reads r2 from final VL
library
Initial library is the library sequenced before phage display experiment. Final
library is the library sequenced after all rounds of phage display.
If your reads are single-end, ATTILA will ask the location of four input files, using the
following parameters:
• VH R0 path: location of fastq file containing reads from initial VH library
• VH RN path: location of fastq file containing reads from final VH library
• VL R0 path: location of fastq file containing reads from initial VL library
• VL RN path: location of fastq file containing reads from final VL library
The remaining parameters will be asked for both types of reads:
• Minimum read length: the default value is 300 pb (approximate size of variable domain
coding region); type “y” to change default value and enter the new read length using
an integer number ; if want to use the default, type “n” or press ENTER key
• Minimum base quality: the default value is 20; type “y” to change default value and
enter the new base quality using an integer number ; if want to use the default, type
“n” or press ENTER key
• Number of candidates to rank: number of candidate clones that ATTILA will try to
find in VH and VL libraries; the number must be an integer
ATTILA will print all configurations you have entered, so that you can check if they
correct. In positive case, type “y” or press ENTER key. Then, ATTILA will name the
configuration files using the project name and “VH” or “VL” ending, eg.: myproject VH.cfg
and myproject VL.cfg. If you need to correct anything, type “n” and the configuration
editing menu will be open. You can learn how to use this menu in section Configuration
editing menu.
5

Running the analysis
After you have entered the configuration files path or the parameters necessary to create
them, ATTILA will start the analysis of your data. It will print some messages to inform
what is currently being done, when the analysis of both libraries are completed and the
execution time of the analysis. In case of a successful analysis, the following messages will
be printed:
Creating project directory
Running VH analysis ...
real 1m21.031s
user 1m24.390s
sys 0m3.320s
-------------------------------VH Analysis Completed
-------------------------------Running VL analysis ...
real 2m42.973s
user 2m42.060s
sys 0m6.410s
-------------------------------VL Analysis Completed
-------------------------------Analysis report is ready !

6

Visualizing results
ATTILA creates in your project directory separate directories for VH and VL libraries1 ,
where most of the files generated will be located, as shown in Figure 2. A summary of the
analysis of both libraries can be found in Report directory, which is also a subdirectory of
the project directory.

Figure 2: Project directory structure.
For each library type, ATTILA creates the same subdirectory structure (Figure 3), with
3 subdirectories: InitialRound, FinalRound and SelectedSequences. The files located in
the InitialRound directory are intermediate output files generated by the analysis of the
initial library, the library sequenced before phage display experiment. In the FinalRound
directory, the files are from the final library, the library sequenced after the experiment.
And the SelectedSequences directory has the files containing the results of the analysis, i.e.,
information about the candidate clones of the corresponding library type (VH or VL).

What files really matter?
Even though a lot of files are generated by the analysis, you do not need to see all of them.
It is worthy to mention that in the file name, the “?” character is the number of candidate
clones chosen by you. So here are the files that really matter to you:
• Report.html The analysis report for VH and VL, showing germline classification, fold
change, statistics, regions of variable domain of candidate clones and reads information.
This file is located in Report directory. You will need Internet connection to correctly
visualize this report, since its content is dinamic2
• vhlist?numbered.fasta A fasta file, located in SelectedSequences subdirectory of
VH, containing the aminoacid sequences from VH candidate clones.
1

In the project directory there are also log files, with standard output from the programs used in the
method. These files are just a way for us to keep track of possible errors, so you do not need to check on
them.
2
This report of the analysis will be explained in detail in section Better explained analysis report.

7

Figure 3: Library directory structure. The “?” can be a “H” (heavy chain) or a “L” (light chain).
• vllist?numbered.fasta A fasta file, located in SelectedSequences subdirectory of
VL, containing the aminoacid sequences from VL candidate clones.
• vhlist?numberednt.fasta A fasta file, located in SelectedSequences subdirectory
of VH, containing the nucleotide sequences from VH candidate clones.
• vllist?numberednt.fasta A fasta file, located in SelectedSequences subdirectory
of VL, containing the nucleotide sequences from VL candidate clones.
• vhSequenceCounting.csv A csv file, located in VH directory, containing the number
of reads of each step of the analysis.
• vlSequenceCounting.csv A csv file, located in VL directory, containing the number
of reads of each step of the analysis.
• *fastqc.zip: reads quality reports generated by FASTQC, they are in InitialRound
and FinalRound directories, for both library type directories, VH and VL. Note that
the evaluation of reads quality is done with initial and final libraries before and after
filtering step, so for each library there are two FASTQC reports.

Better explained analysis report
The analysis report gather different results in one file, allowing a friendly visualization of your
data. The report has 2 tabs, one for each library type (VH and VL). In each tab, there are
3 sections: Reads Information, Candidate Clones and Regions of Variable Domain
of Candidate Clones.
The section Reads Information has one table showing the loss of reads of initial and
final libraries (Figure 4). The loss of reads is the percentage of reads removed after filtering
reads by length and base quality score PHRED. All reads with less than 300 pb and/or less
than 20 base quality score are removed. If you have set ATTILA with different values of

8

minimum length and base quality, then the filtering step will take these values as thresholds
to remove reads.
Besides the loss of reads table, the section has two plots (Figure 4). One plot shows the
proportion of reads with adequate and inadequate length, i.e., the proportion of reads with
more than or equal to the minimum read length (default is 300 pb) and the proportion of
reads with less than the minimum read length. The nice thing about this plot is the rationale
behind it. Considering that the filtering step removes reads with inadequate length and/or
inadequate base quality, if you had a high loss of reads and the FASTQC report shows that
the reads have good sequencing quality, then your reads were removed because of inadequate
length. In this case, you will see in the plot a proportion of reads with inadequate length
bigger than that of reads with adequate length. In summary, this plot gives you a hint of
what happened in the filtering step.
The other plot shows the number of reads by task, i.e., the number of reads in each step
of the analysis. The number of reads of task called “None” are in fact the raw data, the
reads before any processing. The number of reads is shown for both, initial and final library.
The only task that does not have a quantity for both libraries is the “Enrichment”, since this
step compares initial and final library to find clones with increased frequency, and produces
as output a file with sequences from final library to represent the enriched clones.

Figure 4: Section Reads Information of the analysis report.
The section Candidate Clones has one table showing information of the candidate
clones (Figure 5), such as the fold change (how much the frequency increased from initial to
final library), the p-value (if it is less than α, then the difference of proportion of a given clone
is statistically significant), confidence interval (the narrrower the interval the more precise
are the clone proportions we calculated), the germline that the candidate clone belongs to
9

and its respective alignment identity value. If you hover the mouse on the rank number,
you will see the id of the candidate sequence. It is worthy to mention that the number
of candidate clones found may be less than the number of candidates you asked, because
it is possible that not all the most frequent clones have the canonical aminoacid residues
of immunoglobulin variable domain, and therefore they can not be considered candidates
according to our biological criterion.

Figure 5: Section Candidate Clones of the analysis report.
Finally, the section Regions of Variable Domain of Candidate Clones, has one table
showing the aminoacid residues from each region of variable domain of the candidate clones
(Figure 6). The sequences ids are also shown if you hover the rank number. The sequences
used by IgBlast to identify each region of variable domain of candidate clones are germline
sequences, downloaded from IgBlast webpage [1, 2].

Figure 6: Section Regions of Variable Domain of Candidate Clones of the analysis report.

10

Configuration editing menu
If you let ATTILA create the configuration files, after you have entered all parameters,
it will print the configuration and ask if it is correct. Type “y” or press ENTER key, if
the configuration is correct. Then ATTILA will start the analysis. If the configuration is
incorrect, type “n” to open the configuration editing menu. In this case, the following list of
parameters will be printed:
---------------------Configuration Editing Menu--------------------------Project Name (1)
Directory to save project (2)
Reads are paired-end (4)
Path of fastq file of VH R0 paired-end reads r1 (5)
Path of fastq file of VH R0 paired-end reads r2 (6)
Path of fastq file of VH RN paired-end reads r1 (7)
Path of fastq file of VH RN paired-end reads r2 (8)
Path of fastq file of VL R0 paired-end reads r1 (9)
Path of fastq file of VL R0 paired-end reads r2 (10)
Path of fastq file of VL RN paired-end reads r1 (11)
Path of fastq file of VL RN paired-end reads r2 (12)
Path of fastq file of VH R0 single-end reads (13)
Path of fastq file of VH RN single-end reads (14)
Path of fastq file of VL R0 single-end reads (15)
Path of fastq file of VL RN single-end reads (16)
Minimum Read Length (18)
Mininum Base Quality (19)
Number of Candidates (20)
Save and exit (0)
------------------------------------------------------------------------Enter corresponding integer to correct settings:
You must enter the number corresponding to the parameter you want to change. Then
ATTILA will ask you the new configuration. The parameter corresponding to number “4” is
a bit different. If you type “4”, ATTILA will tell you to type “1” if your reads are paired-end
or “0” if your reads are single-end. When you are done editing the configuration, type “0”
to save and quit the menu. ATTILA will create the configuration files and start the analysis.

11

Advanced Topic
What if you want to run ATTILA from a different directory?
As explained in section Installation, you must choose a directory to install ATTILA where
the IgBlast and ATTILA packages are subdirectories. But, after you have installed it you
may run ATTILA from another directory by doing the following instructions:
• Go to the new directory where you want to run ATTILA, using cd command.
• Type the command line:
ln -s  attilacli.sh
Example: ln -s /home/Attila/programs/attilacli.sh attilacli.sh
• Copy the file “paths attila.txt” to the directory where you want to run ATTILA, using
the command line:
cp  .
Example: cp /home/paths_attila.txt .
Note that the dot (.) in the end of the command is necessary.
• Run attila typing:
./attilacli.sh

12

References
[1] NCBI. Igblast tool. http://www.ncbi.nlm.nih.gov/igblast/.
[2] Jian Ye, Ning Ma, Thomas L Madden, and James M Ostell. Igblast: an immunoglobulin
variable domain sequence analysis tool. Nucleic acids research, page gkt382, 2013.

13

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 14
Page Mode                       : UseOutlines
Author                          : 
Title                           : 
Subject                         : 
Creator                         : LaTeX with hyperref package
Producer                        : pdfTeX-1.40.14
Create Date                     : 2016:08:22 09:33:44-03:00
Modify Date                     : 2016:08:22 09:33:44-03:00
Trapped                         : False
PTEX Fullbanner                 : This is pdfTeX, Version 3.1415926-2.5-1.40.14 (TeX Live 2013/Debian) kpathsea version 6.1.1

EXIF Metadata provided by EXIF.tools

Attila Manual

Navigation menu

Versions of this User Manual:

Views

Navigation