Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 10

TEF2 – User manual

2018-06-27

D. Stratmann, JS. Pathmanathan, G. Postic, J. Rey, J. Chomilier

Sorbonne Université, UMR 7590 CNRS, MNHN, IRD, Institut de Minéralogie de Physique des

Matériaux et de Cosmochimie (IMPMC), Paris, France

INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité, RPBS, Paris, France

Contact: dirk.stratmann@sorbonne-universite.fr

Contents

1 INTRODUCTION 3

2 TEF 1.0 versus TEF 2.0 4

3 How to use TEF 2.0 ? 4

4 OPTIONS [name of the option (command line)] 5

4.1 PDBﬁles ........................................ 5

4.1.1 One single PDB ﬁle (-pdbFile ﬁlename) . . . . . . . . . . . . . . . . . . . . 5

4.1.2 Directory with PDB ﬁles (-pdb directory) . . . . . . . . . . . . . . . . . . . 5

4.2 Outputﬁles ....................................... 5

4.2.1 Output directory (-out directory) . . . . . . . . . . . . . . . . . . . . . . . . 5

4.2.2 Sub-directories (-subdir) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.2.3 Runpymol(-pymol) .............................. 5

4.2.4 Debug information (-debug) . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.3 TEF selection approach (-version 2) . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.4 Common parameters for both selection approaches (TEF 1.0 and TEF 2.0) . . . . . . 6

4.4.1 Maximum TEF ends distance (-d 10.0) . . . . . . . . . . . . . . . . . . . . 6

4.4.2 Minimum TEF length (-min 10) . . . . . . . . . . . . . . . . . . . . . . . . 6

4.4.3 Maximum TEF length (-max 100) . . . . . . . . . . . . . . . . . . . . . . . 6

4.5 TEF2.0parameters................................... 6

4.5.1 Scoresweights ................................. 6

4.5.2 Overlap(-overlap2) .............................. 6

4.5.3 MAX_GAP(-gap100)............................. 7

4.5.4 Use NACCESS (-naccess) . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.5.5 Residues accessibility (-a 25) . . . . . . . . . . . . . . . . . . . . . . . . . . 7

5 RESULTS 7

5.1 TEF representation [XXXX_solutions.tef, XXXX = PDB id] . . . . . . . . . . . . . 7

5.2 List of all possible TEFs [XXXX_positions_chain_X.tef, XXXX = PDB id and X=

chainname]....................................... 8

5.3 Pymol script + PNG image [XXXX.pymol/png, XXXX = PDB id] . . . . . . . . . . 9

5.4 Input parameters [parameters.tef] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.5 Statistics [all_xxxxxxx.tef] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.5.1 Solutions [all_results.tef] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

5.5.2 Sequence coverage [all_coverage.tef] . . . . . . . . . . . . . . . . . . . . . 9

5.5.3 Average TEF ends distances [all_ca_dist_mean.tef] . . . . . . . . . . . . . . 9

5.5.4 TEF lengths [all_tef_length.tef] . . . . . . . . . . . . . . . . . . . . . . . . 10

5.5.5 Number of TEFs [all_mean_tef.tef] . . . . . . . . . . . . . . . . . . . . . . 10

1 INTRODUCTION

A globular protein is composed of long chains of amino acids folded occasionally on itself, forming

loop like trajectories with typical Cα-Cαdistances below 10 Å between the ends (ﬁg.1). These

fragments were initially called closed loops [1]. The histogram of the sequence separation between

these contact residues presents a maximum at around 20 - 30 amino acids [2]. Later on, it was shown

that the ends of these closed loops include mainly hydrophobic residues [3], and a thorough analysis

demonstrated that these hydrophobic amino acids were highly conserved among structures of the

same family even among distantly related members, and they were called topohydrophobic positions

[4]. The concept of TEFs (Tightened End Fragments) emerged from the joint concepts of closed loops

and topohydrophobic positions [5].

The decomposition of a protein structure into TEFs can be done in different ways because of

redundancy due to overlapping TEF. This redundancy is quite huge, as the length of all possible TEF

exceeds about 50 times the sequence length. The ﬁrst approach (distance approach, TEF 1.0 program)

selects the TEFs with the tightest ends in terms of distance between Cα-atoms (ﬁg.2a).This approach

is also used by the DHcL server from Berezovsky et al. [6]. The disadvantage of this approach is

that the poor coverage of the protein by TEFs. In order to improve the splitting of a domain into

its constituting TEFs, the second approach (score approach, TEF 2.0 program) selects a sequence

decomposition by TEFs that minimizes the number of residues unassigned to any TEF and choose

TEFs with the tightest ends in terms of distance between Cα-atoms (ﬁg.2b).

(a) (b)

Figure 2: TEF with (a) distance approach (TEF 1.0 program) and (b) score approach (TEF 2.0 pro-

gram)

For this task a graph-algorithm tests all combinations of TEFs yielding the optimal decomposition

according to a score. The best solution is the solution with the lowest score. The score is composed

of three individual scores, Score gap which measure the percentage of residues unassigned to any TEF

(gap), Scorecα−cαwhich measure the mean of the distances between Cα-atoms and a third optional

Score f rag which measure the number of TEFs (fragmentation) in the solution.

- The gap score Scoregap is simply the sum of gaps, i.e. the number of residues unassigned to any

TEF :

Scoregap =∑gap ∗wgap (1)

where wgap is the weight (default : 1.0) for this score.

- The distance cα-cαscore Scorecα−cαis the sum of differences to an average distance davg :

Scorecα−cα=

NT EF

∑

i=0

(dcα−cα−davg)∗wcα−cα(2)

where dcα−cαis the distance between Cα-atoms of a TEF i,NT EF is the total number of TEFs in the

solution and wcα−cαis the weight (default : 1.0) for this score. davg is simply the middle of the interval

of distances hdmax

2,dmaxi, with dmax being the maximum allowed distance.

- The optional fragmentation score Score f rag can be included to obtain a higher or lower fragmenta-

tion, i.e. smaller or longer TEFs on average. Its formula is :

Score f rag =−NT EF ∗10∗wf rag (3)

where wf rag is the weight (default : 0.0) for this score. For wf rag >0 a higher fragmentation will be

obtained and for wf rag <0 a lower fragmentation will be obtained

The ﬁnal score for a solution is the sum of the three scores :

Scoresolution =Scoregap +Scorecα−cα+Score f rag (4)

2 TEF 1.0 versus TEF 2.0

We have tested the two approaches - TEF 1.0 and TEF 2.0 - on a data base composed of 278 proteins

with less than 25 % sequence identity in order to compare their coverage rate by TEFs on proteins.

The distance approach (TEF 1.0) gives an average coverage rate of only (67 ±8)% while the score

approach (TEF 2.0) attains (95±3)% (ﬁg.3). At the same time the average distance between the TEF

ends increased only from (5.1±0.3)˚

A(TEF 1.0) to (6.4±0.6)˚

A(TEF 2.0). If required, the TEF ends

distance based optimization done by of TEF 1.0 can be done also by TEF 2.0 by setting the weight

wgap for the gap score equal to zero (command line: -gw 0).

(a) (b)

Figure 3: Decomposition into TEFs of N terminal parts of Enzyme I (PDB code : 3EZA) with (a)

distance approach (TEF 1.0) and (b) score approach (TEF 2.0)

3 How to use TEF 2.0 ?

Step 1 : Give PDB ﬁle or enter the PDB code of the protein.

Step 2 : Choose an approach (TEF 1.0 or TEF 2.0) of the decomposition of protein into TEFs (see

section OPTIONS).

Step 3 : Depending on your needs you can change the default values of options for the selected

approach (see section OPTIONS) or keep them.

Step 4 : click on run and you will obtain the results.

4 OPTIONS [name of the option (command line)]

4.1 PDB ﬁles

Without any option, all PDB ﬁles in the current directory will be treated by the TEF program.

4.1.1 One single PDB ﬁle (-pdbFile ﬁlename)

If only a speciﬁc PDB ﬁle should be treated by the TEF program, use the -pdbFile option to specify

the path and ﬁlename of the PDB ﬁle. This PDB ﬁle can also be an assembly of several PDB ﬁles put

into one single “multi-PDB” ﬁle.

4.1.2 Directory with PDB ﬁles (-pdb directory)

If the PDB ﬁles are not in the current directory, specify with the -pdb option the path to the directory

containing the PDB ﬁles to be treated by the TEF program.

4.2 Output ﬁles

4.2.1 Output directory (-out directory)

The output ﬁles are stored by default in the current directory. This can be changed by specifying the

output directory with the -out option.

4.2.2 Sub-directories (-subdir)

The output ﬁles can be stored automatically in different sub-directories, one per PDB ﬁle (default: no

sub-directories).

4.2.3 Run pymol (-pymol)

With this option TEF 2.0 will call pymol to generate a PNG ﬁle for each PDB. The TEFs are indicated

by colors (see section 5.3). As the ray-tracing step can take a bit of time, the generation of the PNG

ﬁle is deactivated by default.

4.2.4 Debug information (-debug)

Additional ﬁles for debugging purposes will be written.

4.3 TEF selection approach (-version 2)

The distance approach (-version 1) selects the TEFs with the tightest ends in terms of distance between

Cα-atoms (ﬁg.2a).

The score approach (-version 2, default) selects a sequence decomposition by TEFs that minimizes

the number of residues unassigned to any TEF and choose TEFs with the tightest ends in terms of

distance between Cα-atoms (ﬁg.2b).

4.4 Common parameters for both selection approaches (TEF 1.0 and TEF 2.0)

4.4.1 Maximum TEF ends distance (-d 10.0)

Maximum distance between the Cα-atoms of the the TEF ends.

Default : 10.0 Å

Allowed : 4.0 - 15.0 Å

4.4.2 Minimum TEF length (-min 10)

The minimum length of a TEF can be speciﬁed to avoid too short fragments.

Default : 10 AA

Allowed : 10 - 100 AA

4.4.3 Maximum TEF length (-max 100)

The maximum length of a TEF can be speciﬁed to avoid too long fragments.

Default : 100 AA

Allowed : 10 - 100 AA

4.5 TEF 2.0 parameters

4.5.1 Scores weights

It’s possible to change the weight of each scores (formulas 1, 2 and 3) and modify the decomposition

of the protein structure into TEFs.

•Degree of Coverage, wgap (-gw 1.0)

The solution will favor a higher coverage by TEFs for higher values.

Default: 1.0

Allowed : 0.0 – 100.0

•Cα-atoms distance weight, wcα−cα(-dw 1.0)

The solution will favor TEFs with shorter distances at their end for higher values.

Default: 1.0

Allowed : 0.0 – 100.0

•Degree of fragmentation, wf rag (-tw 0.0)

The solution will favor a higher/lower number of short TEFs for positive/negative values.

Default : 0.0

Allowed : -100.0 – 100.0

4.5.2 Overlap (-overlap 2)

By default the maximum allowed overlap between two TEFs is 2 residues, which can be changed by

this option.

4.5.3 MAX_GAP (-gap 100)

The MAX_GAP value controls the search depth for the optimal selection of TEFs. MAX_GAP

corresponds to the maximal length between two TEFs. The gaps are counted from the ﬁrst possible

TEF after the last residue of the current TEF, in order to jump over large part of the sequence without

any TEF. A minimum value of 10 is recommended for MAX_GAP, a too small value may result in an

incomplete search. Higher values will result in a longer search time, but not necessarily change the

ﬁnal result.

Default : 100 AA

Allowed : 0 - 300 AA

4.5.4 Use NACCESS (-naccess)

With this option the NACCESS [7] program can be used to restrict the TEF-ends to the protein core.

By default this ﬁlter for possible TEFs is deactivated.

4.5.5 Residues accessibility (-a 25)

The relative maximum Accessible Surface Area (ASA) of the TEF-ends is calculated by NACCESS

[7]. A small value (< 50%) will constrain the TEF-ends to the protein core, if the option -naccess is

also used.

Default : 25 %

Allowed : 1 - 200 % (for some cases NACCESS gives values > 100%)

5 RESULTS

5.1 TEF representation [XXXX_solutions.tef, XXXX = PDB id]

In the ﬁgure 4 is represented the sequence of the submitted structure and just below the corresponding

TEFs. There are two lines of TEFs to better visualize when two TEFs overlap (maximum overlap 2

residues). Below are listed the TEFs (ﬁrst residue, last residue and distance in Å).

Figure 4: Output of TEF program for TEF representation

5.2 List of all possible TEFs [XXXX_positions_chain_X.tef, XXXX = PDB id

and X= chain name]

The list of all possible TEFs is represented (ﬁg.5) like this :

- column 1 : ﬁrst residue of the TEF

- column 2 : last residue of the TEF

- column 3 : size (in residues) of the TEF

- column 4 : distance in Å between the ﬁrst and last residues of the TEF

Figure 5: Output of TEF program for all possible TEFs list

5.3 Pymol script + PNG image [XXXX.pymol/png, XXXX = PDB id]

A pymol [8] script is generated which allowed to visualize in 3D the decomposition of the protein into

TEFs. For that you have to download the cleaned PDB and the script then execute the pymol script.

This script will write at the end a PNG image ﬁle (XXXX.png, XXXX = PDB id). With the -pymol

option the TEF program will generate automatically this PNG image ﬁle along with the pymol script.

The Figure 6 shows the color code corresponding to the TEF ids.

Figure 6: TEF colors in pymol

5.4 Input parameters [parameters.tef]

The parameters.tef ﬁle contains the command line as well as a list of all parameters used in the run.

5.5 Statistics [all_xxxxxxx.tef]

The ﬁles beginning with “all_” summarize the results for all PDB ﬁles/chains that has been treated by

the TEF program in the current run.

5.5.1 Solutions [all_results.tef]

Gives the TEF decomposition solutions of all PDB ﬁles/chains in a compact form.

5.5.2 Sequence coverage [all_coverage.tef]

The list of sequence coverage values (in %) for all PDB ﬁles.

5.5.3 Average TEF ends distances [all_ca_dist_mean.tef]

The list of average TEF ends distance values (in Å) for all PDB ﬁles.

5.5.4 TEF lengths [all_tef_length.tef]

A list of TEF lengths (in number of residues) of all selected TEFs for all PDB ﬁles.

5.5.5 Number of TEFs [all_mean_tef.tef]

Gives a list for all PDB ﬁles of the number of selected TEFs.

References

[1] Varda Ittah and Elisha Haas. Nonlocal interactions stabilize long range loops in the initial folding

intermediates of reduced bovine pancreatic trypsin inhibitor. Biochemistry, 34(13):4493–4506,

April 1995.

[2] I N Berezovsky, A Y Grosberg, and E N Trifonov. Closed loops of nearly standard size: common

basic element of protein structure. FEBS Letters, 466(2-3):283–286, January 2000.

[3] I N Berezovsky, V M Kirzhner, A Kirzhner, and E N Trifonov. Protein folding: looping from

hydrophobic nuclei. Proteins, 45(4):346–350, December 2001.

[4] Anne Poupon and Jean–Paul Mornon. Populations of hydrophobic amino acids within protein

globular domains: Identiﬁcation of conserved ’topohydrophobic’ positions. Proteins: Structure,

Function, and Bioinformatics, 33(3):329–342, November 1998.

[5] M Lamarine, J P Mornon, N Berezovsky, and J Chomilier. Distribution of tightened end fragments

of globular proteins statistically matches that of topohydrophobic positions: towards an efﬁcient

punctuation of protein folding? Cellular and Molecular Life Sciences: CMLS, 58(3):492–498,

March 2001.

[6] Grzegorz Koczyk and Igor N Berezovsky. Domain hierarchy and closed loops (DHcL): a server

for exploring hierarchy of protein domain structure. Nucleic Acids Research, 36(Web Server

issue):W239–245, July 2008.

[7] Hubbard, S.J. and thornton, J.M. (1993), "NACCESS", computer program, department of bio-

chemistry and molecular biology, university college london.

[8] Delano, W.L. The PyMOL Molecular Graphics System (2002) DeLano Scientiﬁc, san carlos, CA,

USA. http://www.pymol.org.

Manual

Navigation menu

Versions of this User Manual:

Views

Navigation