Molecular Descriptors Guide Manual

User Manual:

Open the PDF directly: View PDF .
Page Count: 70

1Descriptorsofdrugs
2Descriptorsofproteinsandpeptides
3Protein-proteininteractiondescriptors
4Protein-ligandinteractiondescriptors
References:
Appendix:

Molecular

Molecular Descriptors

Descriptors

Descriptors Guide

Guide

Description of the Molecular Descriptors Appearing in the

PyDPI Software Package

Version1.0

Table

Table of

of Contents

Contents

1 Descriptors of drugs

................................................................................................................................

1.1 Molecular constitutional descriptors

.............................................................................................

1.2 Topological descriptors

.................................................................................................................

1.3 Molecular connectivity indices

...................................................................................................

1.4 Kappa shape descriptors

.............................................................................................................

1.5 Burden descriptors

......................................................................................................................

1.6 Basak descriptors

........................................................................................................................

1.7 Electrotopological State Indices

.................................................................................................

1.8 Autocorrelation descriptors

.........................................................................................................

1.8.1 Moreau-Broto autocorrelation descriptors

........................................................................

1.8.2 Moran autocorrelation descriptors

....................................................................................

1.8.3 Geary autocorrelation descriptors

.....................................................................................

1.9 Charge descriptors

......................................................................................................................

1.10 molecular properties

.................................................................................................................

1.11 MOE-type descriptors

...............................................................................................................

1.12 Molecular fingerprint

................................................................................................................

1.12.1 Daylight-type fingerprint

................................................................................................

1.12.2 MACCS keys and FP4 fingerprint

..................................................................................

1.12.3 E-state fingerprint

...........................................................................................................

1.12.4 Atom pairs and topological torsions fingerprints

............................................................

1.12.5 Morgan fingerprint

..........................................................................................................

References:

........................................................................................................................................

2 Descriptors of proteins and peptides

.....................................................................................................

2.1 Amino acid composition

.............................................................................................................

2.2 Dipeptide composition

................................................................................................................

2.3 Tripeptide composition

...............................................................................................................

2.4 Autocorrelation descriptors

.........................................................................................................

2.4.1 Normalized Moreau-Broto autocorrelation descriptors

....................................................

2.4.2 Moran autocorrelation

.......................................................................................................

2.4.3 Geary autocorrelation Descriptors

....................................................................................

2.5 Composition, transition and distribution

....................................................................................

2.6 Conjoint Triad Descriptors

.........................................................................................................

2.7 Quasi-sequence-order Descriptors

..............................................................................................

2.7.1 Sequence-order-coupling numbers

...................................................................................

2.7.2 Quasi-sequence-order (QSO) descriptors

.........................................................................

2.8 pseudo-amino acid composition (PAAC)

...................................................................................

2.9 Amphiphilic pseudo-amino acid composition (APAAC)

...........................................................

References:

........................................................................................................................................

3 Protein-protein interaction descriptors

..................................................................................................

4 Protein-ligand interaction descriptors

...................................................................................................

References:

...............................................................................................................................................

Appendix:

.................................................................................................................................................

1 Descriptors

Descriptors

Descriptors of

of drugs

drugs

small or drug molecule could be represented by its chemical structure. In the PyDPI software, we

calculate twelve types of molecular descriptors to represent drug molecules, including constitutional

descriptors, topological descriptors, connectivity indices, Burden descriptors, basak

’

s information

indices, E-state indices, autocorrelation descriptors, charge descriptors, molecular properties, kappa

shape indices, MOE-type descriptors, and molecular fingerprints. These descriptors capture and

magnify distinct aspects of chemical structures.

1.1

1.1 Molecular

Molecular

Molecular constitutional

constitutional

constitutional descriptors

descriptors

1. M olecular weight (

Weight

)

2. Count of hydrogen atoms (

nhyd

)

3. Count of halogen atoms (

nhal

)

4. Count of hetero atoms (

nhet

)

5. Count of heavy atoms (

nhev

)

6. Count of F atoms (

ncof

)

7. Count of Cl atoms (

ncocl

)

8. Count of Br atoms (

ncobr

)

9. Count of I atoms (

ncoi

)

10. Count of C atoms (

ncarb

)

11. Count of P atoms (

nphos

)

12. Count of S atoms (

nsulph

)

13. Count of O atoms (

noxy

)

14. Count of N atoms (

nnitro

)

15. Number of rings (

nring

)

16. Number of rotatable bonds (

nrot

)

17. Number of H-bond donors (

ndonr

)

18. Number of H-bond acceptors (

naccr

)

19. Number of single bonds (

nsb

)

20. Number of double bonds (

ndb

)

21. Number of triple bonds (

ntb

)

22. Number of aromatic bonds (

naro

)

23. Number of all atoms (

nta

)

24. Average molecular weight (

AWeight

)

25. Molecular path counts of length 1 (

PC1

)

26. Molecular path counts of length 2 (

PC2

)

27. Molecular path counts of length 3 (

PC3

)

28. Molecular path counts of length 4 (

PC4

)

29. Molecular path counts of length 5 (

PC5

)

30. Molecular path counts of length 6 (

PC6

)

Introduction:

(1) The molecular weight (MW) is the sum of molecular weights of the individual atoms , defined

as:

MWMW

∑

a nd the average molecular weight (AWeight) is given as follows:

AWeight=MW/nAT

where

nAT

is the number of atoms

(2) T he number of hydrogen (

nhyd

), carbon (

ncarb

), nitrogen (

nnitro

), oxygen (

noxy

), phosphorus

(

nphos

), sulfur (

nsulph

), fluorine (

ncof

), chlorine (

ncocl

), bromine (

ncobr

), and iodine (

ncoi

)

atoms are simply the total number of each of these types of atoms in the molecule.

T he number of halogen atoms (

nhal

) is simply the sum of the counts of the halogen atoms; the

number of heavy atoms (

nhev

) and hetero atoms (

nhet

) are defined the similar way.

(3) F rom descriptor 15 to 22, they are simply the number of ring, single bond, double bond,

aromatic bond and H-acceptor, etc, in the molecule.

(4) F rom descriptor 25 to 30, they represent the number of path of length 1-6. T he path of length

indicates the shortest distance equal

between two atoms in a topological molecular graph.

1.2

1.2 Topological

Topological

Topological descriptors

descriptors

1. Weiner index (

)

2. Average Weiner index (

)

3. Balaban

’

s J index (

)

4. Harary number (

hara

)

5. Schiultz index (

sch

)

6. Graph distance index (

Tigdi

)

7. Platt number (

Platt

)

8. Xu index (

)

9. Polarity number (

Pol

)

10. Pogliani index (

)

11. Ipc index (

Ipc

)

12. BertzCT (

BertzCT

)

13. Gutman molecular topological index based on simple vertex degree (

GMTI

)

14. Zagreb index with order 1 (

ZM1

)

15. Zagreb index with order 2 (

ZM2

)

16. Modified Zagreb index with order 1 (

MZM1

)

17. Modified Zagreb index with order 2 (

MZM2

)

18. Quadratic index (

Qindex

)

19. Largest value in the distance matrix (

diametert

)

20. Radius based on topology (

radiust

)

21. Petitjean based on topology (

petitjeant

)

22. The logarithm of the simple topological index by Narumi (

Sito

)

23. Harmonic topological index proposed by Narnumi (

Hato

)

24. Geometric topological index by Narumi (

Geto

)

25. Arithmetic topological index by Narumi (

Arto

)

I ntroduction:

ntroduction:

(1) Weiner index (

)

()/2

ijWd

∑

ijd

is the entries of distance matrix D from H-depleted molecular graph.

(2) Average Weiner index (

)

The average Weiner index is given by

(1)

WWA

−

w here

is the total number of atoms in the molecule, W and

are described in more detail

on pa 497 of the Handbook of Molecular Descriptors

(3) Balaban

’

s J index (

)

1/2()

ijb

σσ

−=

∑

w here

iσ

and

jσ

are the vertex distance degree of adjacent atoms, and the sum run over

all the molecular bond b , B is the number of bonds in the molecular graph and C is the number

of rings.

are described in more detail on pa 21 of the Handbook of Molecular Descriptors

(4) Harary number (

hara

)

−=

∑∑

The Harary index is a molecular topological index derived from the reciprocal distance matrix

-1

(5) Schiultz index (

sch

)

[()]

MTIv

==+

∑

It is a topological index derived from the adjacency matrix A

A, the distance matrix D

Dand

-dimensional column vector

constituted by the vertex degree of the

atoms.

(6) Graph distance index (

Tigdi

)

The graph distance index is defined as the squared sum of all graph distance counts:

()

GDIf

∑

w here D is the topological diameter,

is the total number of distances in the graph equal to k.

(7) Platt number (

Platt

)

Platt number is also known as the total edge adjacency index A

, it is the sum over all entries of

the edge adjacency matrix:

Eij

∑∑

w here B is the number of edges in molecular graph

(8) Xu index (

)

It is a topological molecular descriptor based on the adjacency matrix and distance matrix; it is

defined as:

log

XuA

δσ

∑

w here

is the number of atoms,

is vertex degree and

is distance degree of all the atoms.

(9) Polarity number (

Pol

)

It is usually assumed that the polarity number accounts for the flexibility of acyclic structure; it

is usually calculated on the distance matrix as the number of pairs of vertices at a topological

distance equal to three. Some other polarity number also been defined based on different rules.

(10) Pogliani index (

)

∑

w here

is the number of atoms, Z is the number of valence electrons and

the principal

quantum number.

(11) Ipc index (

Ipc

)

Ipc index is the information for polynomial coefficients based information theory.

(12) BertzCT (

BertzCT

)

It is the most popular complexity index, taking into account both the variety of kinds of bond

connectivities and atom types. It is defined as:

CPXCPBCPA

III

w here I

CPB

and I

CPA

are the information contents related to the bond connectivity and atom type

diversity

(13) Gutman molecular topological index based on simple vertex degree (

GMTI

)

Gijij

Sd δδ

∑∑

w here

ijij

dδδ

is the topological distance between vertex i and vertex j weighted by the product

of the endpoint vertex degrees.

(14) Zagreb index with order 1 (

ZM1

)

The first Zagreb index (Weighted by vertex degrees) is given by

Mδ

∑

w here

runs over the

atoms of the molecule and

is the vertex degree.

(15) Zagreb index with order 2 (

ZM2

)

2()

ijb

Mδδ

∑

w here b runs over all the bonds in the molecule

The Zagreb indices are described on pg 509 of Handbook of Molecular Descriptors

(16) Modified Zagreb index with order 1 (

MZM1

)

(17) Modified Zagreb index with order 2 (

MZM2

)

(18) Quadratic index (

Qindex

)

2(2)2

ggF

−+=

∑

Quadratic index also called normalized quadratic index, where

are the different vertex degree

values and

is the vertex degree count.

(19) Largest value in the distance matrix (

diametert

)

max()

iiDη

max()

ijij

dη

iη

called atom eccentricity is the maximum distance from the

ith

vertex to the other vertices.

(20) Radius based on topology (

radiust

)

min()

iiRη

(21) Petitjean based on topology (

petitjeant

)

−=

(22) The logarithm of the simple topological index by Narumi (

Sito

)

Sδ

∏

w here

is the number of atoms,

Sito

is a molecular descriptor related to molecular branching

proposed as the product of the vertex degrees.

(23) Harmonic topological index proposed by Narumi (

Hato

)

∑

(24) Geometric topological index by Narumi (

Geto

)

Gδ

⎛⎞

⎜⎟

⎝⎠

∏

(25) Arithmetic topological index by Narumi (

Arto

)

∑

1.3

1.3 Molecular

Molecular

Molecular connectivity

connectivity

connectivity indices

indices

1. Valence molecular connectivity Chi index for path order 0 (

)

2. Valence molecular connectivity Chi index for path order 1(

)

3. Valence molecular connectivity Chi index for path order 2(

)

4. Valence molecular connectivity Chi index for path order 3(

)

5. Valence molecular connectivity Chi index for path order 4(

)

6. Valence molecular connectivity Chi index for path order 5(

)

7. Valence molecular connectivity Chi index for path order 6(

)

8. Valence molecular connectivity Chi index for path order 7 (

)

9. Valence molecular connectivity Chi index for path order 8(

)

10. Valence molecular connectivity Chi index for path order 9(

)

11. Valence molecular connectivity Chi index for path order 10(

)

12. Valence molecular connectivity Chi index for three cluster (

)

13. Valence molecular connectivity Chi index for four cluster (

)

14. Valence molecular connectivity Chi index for path/cluster (

)

15. Valence molecular connectivity Chi index for cycles of 3 (

)

16. Valence molecular connectivity Chi index for cycles of 4 (

)

17. Valence molecular connectivity Chi index for cycles of 5 (

)

18. Valence molecular connectivity Chi index for cycles of 6 (

)

19. Simple molecular connectivity Chi indices for path order 0 (

)

20. Simple molecular connectivity Chi indices for path order 1 (

)

21. Simple molecular connectivity Chi indices for path order 2 (

)

22. Simple molecular connectivity Chi indices for path order 3 (

)

23. Simple molecular connectivity Chi indices for path order 4 (

)

24. Simple molecular connectivity Chi indices for path order 5 (

)

25. Simple molecular connectivity Chi indices for path order 6 (

)

26. Simple molecular connectivity Chi indices for path order 7 (

)

27. Simple molecular connectivity Chi indices for path order 8 (

)

28. Simple molecular connectivity Chi indices for path order 9 (

)

29. Simple molecular connectivity Chi indices for path order 10 (

)

30. Simple molecular connectivity Chi indices for three cluster (

)

31. Simple molecular connectivity Chi indices for four cluster (

)

32. Simple molecular connectivity Chi indices for path/cluster (

)

33. Simple molecular connectivity Chi indices for cycles of 3 (

)

34. Simple molecular connectivity Chi indices for cycles of 4 (

)

35. Simple molecular connectivity Chi indices for cycles of 5 (

)

36. Simple molecular connectivity Chi indices for cycles of 6 (

)

37. mean chi1 (Randic) connectivity index (

mChi1

)

38. the difference between chi3c and chi4pc (

knotp

)

39. the difference between chi0v and chi0 (

dchi0

)

40. the difference between chi1v and chi1 (

dchi 1

)

41. the difference between chi2v and chi2 (

dchi0

)

42. the difference between chi3v and chi3 (

dchi 3

)

43. the difference between chi4v and chi4 (

dchi 4

)

44. the difference between chiv3c and chiv4pc (

knotpv

)

Introduction:

1. S imple molecular connectivity index (No.19~36)

T he general formula for the molecular connectivity indices (

) is as follows:

1/2

()

qak

χδ

−

∑

∏

w here

runs over all of the

mth

order sub - graphs constituted by

atoms;

is the total number of

mth

order sub - graphs present in the molecular graph and in the case of the path

sub - graphs equals the

mth

order path count

. The product is over the simple vertex degrees of all

the vertices involved in each sub - graph. The subscript “

” for the connectivity indices refers to the

type of molecular sub - graph and

for chain or ring,

for path-cluster,

for cluster, and

for path .

F or the first three path indices (

χ ,

), the calculation type,

, is often omitted from the variable

name in the software.

alence molecular connectivity indices (No.1~18)

T he valence connectivity indices (

) are calculated in the same fashion as the simple connectivity

indices except that the vertex degree are replaced by the valence vertex degree, and the valence

degree is given by:

. Where

is the number of valence electrons,

is the number

of electrons in

orbital and

is the number of electrons in lone-pair orbitals.

T he valence connectivity indices are described on page 86 of the Handbook of Molecular

Descriptors. T he connectivity indices are described in detail in the literature.

3. T he remains connectivity indices are simple combination of the above simple connectivity indices

and valence connectivity indices.

1.4

1.4 Kappa

Kappa

Kappa shape

shape

shape descriptors

descriptors

1. Kappa alpha index for 1 bonded fragment (

)

2. Kappa alpha index for 2 bonded fragment (

)

3. Kappa alpha index for 3 bonded fragment (

)

4. Kier molecular flexibility index (

phi

)

5. Molecular shape Kappa index for 1 bonded fragment (

)

6. Molecular shape Kappa index for 2 bonded fragment (

)

7. Molecular shape Kappa index for 3 bonded fragment (

)

I ntroduction:

ntroduction:

(1) Kappa alpha index

The first order kappa shape index (

) is given by

11112212

maxmin

2/()(1)/()

iikPPPAAP

==−

w here

=# of paths of bond length

in the hydrogen suppressed molecule and

is the number

of non hydrogen atoms in the molecule.

The second order kappa shape index (

) is given by

22222222

maxmin

2/()(1)(2)/()

kPPPAAP

==−−

The kappa shape indices are described on pg 248 of the Handbook of Molecular Descriptors .

The first order kappa alpha shape index (

) is given by

()(1)

()

AaAa

++−

w here

3()

xsp

=−

w here

is the covalent radius of the atom being evaluated and

3()

xsp

is the covalent radius of a

carbon

atom (0.77 Å ).

The second order kappa alpha shape index (

) is given by

(1)(2)

()

AaAa

+−+−

The third order kappa alpha shape index (

) is given by

(1)(3)

()

AaAa

+−+−

is odd

(3)(2)

()

AaAa

+−+−

is even

The kappa shape indices are described on page 250 of the Handbook of Molecular Descriptors.

The kappa flexibility index (

phi

) is given by

kkphi

The kappa flexibility index is described on page 178 of the Handbook of Molecular Descriptors.

1.5

1.5 Burden

Burden

Burden descriptors

descriptors

1. Highest eigenvaluen.1 of Burden matrix/weighted by atomic masses (

bcutm1

)

2. Highest eigenvaluen.2 of Burden matrix/weighted by atomic masses (

bcutm2

)

3. Highest eigenvaluen.3 of Burden matrix/weighted by atomic masses (

bcutm 3

)

4. Highest eigenvaluen.4 of Burden matrix/weighted by atomic masses (

bcutm 4

)

5. Highest eigenvaluen.5 of Burden matrix/weighted by atomic masses (

bcutm 5

)

6. Highest eigenvaluen.6 of Burden matrix/weighted by atomic masses (

bcutm 6

)

7. Highest eigenvaluen.7 of Burden matrix/weighted by atomic masses (

bcutm7

)

8. Highest eigenvaluen.8 of Burden matrix/weighted by atomic masses (

bcutm8

)

9. Lowest eigenvaluen.1 of Burden matrix/weighted by atomic masses (

bcutm1

)

10. Lowest eigenvaluen.2 of Burden matrix/weighted by atomic masses (

bcutm2

)

11. Lowest eigenvaluen.3 of Burden matrix/weighted by atomic masses (

bcutm3

)

12. Lowest eigenvaluen.4 of Burden matrix/weighted by atomic masses (

bcutm4

)

13. Lowest eigenvaluen.5 of Burden matrix/weighted by atomic masses (

bcutm5

)

14. Lowest eigenvaluen.6 of Burden matrix/weighted by atomic masses (

bcutm6

)

15. Lowest eigenvaluen.7 of Burden matrix/weighted by atomic masses (

bcutm7

)

16. Lowest eigenvaluen.8 of Burden matrix/weighted by atomic masses (

bcutm8

)

17. Highest eigenvaluen.1 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv1

)

18. Highest eigenvaluen.2 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv2

)

19. Highest eigenvaluen.3 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv3

)

20. Highest eigenvaluen.4 of Burden matrix/weighted by atomic vander Waals volumes(

bcutv4

)

21. Highest eigenvaluen.5 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv5

)

22. Highest eigenvaluen.6 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv6

)

23. Highest eigenvaluen.7 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv7

)

24. Highest eigenvaluen.8 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv8

)

25. Lowest eigenvaluen.1of Burden matrix/weighted by atomic vander Waals volumes (

bcutv1

)

26. Lowest eigenvaluen.2 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv2

)

27. Lowest eigenvaluen.3 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv3

)

28. Lowest eigenvaluen.4 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv4

)

29. Lowest eigenvaluen.5 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv5

)

30. Lowest eigenvaluen.6 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv6

)

31. Lowest eigenvaluen.7of Burden matrix/weighted by atomic vander Waals volumes (

bcutv7

)

32. Lowest eigenvaluen.8 of Burden matrix/weighted by atomic vander Waals volumes (

bcutv8

)

33. Highest eigenvaluen.1 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute1

)

34. Highest eigenvaluen.2 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute2

)

35. Highest eigenvaluen.3 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute3

)

36. Highest eigenvaluen.4 of Burden matrix/weighted by atomic Sandersonel ectronegativities (

bcute4

)

37. Highest eigenvaluen.5 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute5

)

38. Highest eigenvaluen.6 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute6

)

39. Highest eigenvaluen.7of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute7

)

40. Highest eigenvaluen.8 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute8

)

41. Lowest eigenvaluen.1 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute1

)

42. Lowes teigenvaluen.2 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute2

)

43. Lowest eigenvaluen.3 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute3

)

44. Lowest eigenvaluen.4 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute4

)

45. Lowest eigenvaluen.5 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute5

)

46. Lowest eigenvaluen.6 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute6

)

47. Lowesteigenvaluen.7 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute7

)

48. Lowest eigenvaluen.8 of Burden matrix/weighted by atomic Sanderson electronegativities (

bcute8

)

49. Highest eigenvaluen.1 of Burden matrix/weighted by atomic polarizabilities (

bcutp1

)

50. Highest eigenvaluen.2 of Burden matrix/weighted by atomic polarizabilities (

bcutp2

)

51. Highesteigenvaluen.3 of Burden matrix/weighted by atomic polarizabilities (

bcutp3

)

52. Highest eigenvaluen.4 of Burden matrix/weighted by atomic polarizabilities (

bcutp4

)

53. Highest eigenvaluen.5 of Burden matrix/weighted by atomic polarizabilities (

bcutp5

)

54. Highesteigenvaluen.6 of Burden matrix/weighted by atomic polarizabilities (

bcutp6

)

55. Highesteigenvaluen.7 of Burden matrix/weighted by atomic polarizabilities (

bcutp7

)

56. Highest eigenvaluen.8 of Burden matrix/weighted by atomic polarizabilities (

bcutp8

)

57. Lowes teigenvaluen.1 of Burden matrix/weighted by atomic polarizabilities (

bcutp1

)

58. Lowest eigenvaluen.2 of Burden matrix/weighted by atomic polarizabilities (

bcutp2

)

59. Lowest eigenvaluen.3 of Burden matrix/weighted by atomic polarizabilities (

bcutp3

)

60. Lowest eigenvaluen.4 of Burden matrix/weighted by atomic polarizabilities (

bcutp4

)

61. Lowest eigenvaluen.5 of Burden matrix/weighted by atomic polarizabilities (

bcutp5

)

62. Lowest eigenvaluen.6 of Burden matrix/weighted by atomic polarizabilities (

bcutp6

)

63. Lowest eigenvaluen.7of Burden matrix/weighted by atomic polarizabilities (

bcutp7

)

64. Lowest eigenvaluen.8 of Burden matrix/weighted by atomic polarizabilities (

bcutp8

)

Introduction:

The Burden eigenvalue descriptors are determined by solving the following general eigenvalue

equation:

B.V

V=V

where B

Bis a real connectivity matrix to be defined, V

Vis a matrix of eigenvectors, and

is a diagonal

matrix of eigenvalues. The rules defining B

Bare as follows:

a. Hydrogen atoms are included.

b. The diagonal elements of B

, are either given by the carbon normalized atomic mass, vander

Waals volume, Sanderson electronegativity, and polarizability of atom

c. The element of B

Bconnecting atoms

and

, is equal to the square root of the bond order

between atoms

and

d. All other elements of B

B(corresponding non bonded atom pairs) are set to 0.001.

The carbon normalized weights are as follows:

The lowest eigenvalues are the absolute values of the negative eigenvalues. The highest eigenvalues are

the eight largest positive eigenvalues. The Burden eigenvalues descriptors are described on the

Handbook of Molecular Descriptors (Todeschini and Consonni 2000)

1.6

1.6 Basak

Basak

Basak descriptors

descriptors

(1) The information content with order 0 proposed by Basak (IC0)

(2) The information content with order 1 proposed by Basak(IC1)

(3) the information content with order 2 proposed by Basak(IC2)

(4) The information content with order 3 proposed by Basak(IC3)

(5) The information content with order 4 proposed by Basak(IC4)

(6) The information content with order 5 proposed by Basak(IC5)

(7) The information content with order 6 proposed by Basak(IC6)

(8) The structural information content with order 0 proposed by Basak (SIC0)

(9) The structural information content with order 1 proposed by Basak(SIC1)

(10) The structural information content with order 2 proposed by Basak(SIC2)

(11) The structural information content with order 3 proposed by Basak(SIC3)

(12) The structural information content with order 4 proposed by Basak(SIC4)

(13) The structural information content with order 5 proposed by Basak(SIC5)

(14) The structural information content with order 6 proposed by Basak(SIC6)

(15) The complementary information content with order 0 proposed by Basak(CIC0)

(16) The complementary information content with order 1 proposed by Basak(CIC1)

(17) The complementary information content with order 2 proposed by Basak(CIC2)

(18) The complementary information content with order 3 proposed by Basak(CIC3)

(19) The complementary information content with order 4 proposed by Basak(CIC4)

(20) The complementary information content with order 5 proposed by Basak(CIC5)

(21) The complementary information content with order 6 proposed by Basak(CIC6)

1. 7

7 Electrotopological

Electrotopological

Electrotopological State

State

State Indices

Indices

1. Sum of E-State of atom type: sLi (

)

2. Sum of E-State of atom type: ssBe (

)

3. Sum of E-State of atom type: ssssBe (

)

4. Sum of E-State of atom type: ssBH (

)

5. Sum of E-State of atom type: sssB (

)

6. Sum of E-State of atom type: ssssB (

)

7. Sum of E-State of atom type: sCH3 (

)

8. Sum of E-State of atom type: dCH2 (

)

9. Sum of E-State of atom type: ssCH2 (

)

10. Sum of E-State of atom type: tCH (

S10

)

11. Sum of E-State of atom type: dsCH (

S11

)

12. Sum of E-State of atom type: aaCH (

S12

)

13. Sum of E-State of atom type: sssCH (

S13

)

14. Sum of E-State of atom type: ddC (

S14

)

15. Sum of E-State of atom type: tsC (

S15

)

16. Sum of E-State of atom type: dssC (

S16

)

17. Sum of E-State of atom type: aasC (

S17

)

18. Sum of E-State of atom type: aaaC (

S18

)

19. Sum of E-State of atom type: ssssC (

S19

)

20. Sum of E-State of atom type: sNH3 (

S20

)

21. Sum of E-State of atom type: sNH2 (

S21

)

22. Sum of E-State of atom type: ssNH2 (

S22

)

23. Sum of E-State of atom type: dNH (

S23

)

24. Sum of E-State of atom type: ssNH (

S24

)

25. Sum of E-State of atom type: aaNH (

S25

)

26. Sum of E-State of atom type: tN (

S26

)

27. Sum of E-State of atom type: sssNH (

S27

)

28. Sum of E-State of atom type: dsN (

S28

)

29. Sum of E-State of atom type: aaN (

S29

)

30. Sum of E-State of atom type: sssN (

S30

)

31. Sum of E-State of atom type: ddsN (

S31

)

32. Sum of E-State of atom type: aasN (

S32

)

33. Sum of E-State of atom type: ssssN (

S33

)

34. Sum of E-State of atom type: sOH (

S34

)

35. Sum of E-State of atom type: dO (

S35

)

36. Sum of E-State of atom type: ssO (

S36

)

37. Sum of E-State of atom type: aaO (

S37

)

38. Sum of E-State of atom type: sF (

S38

)

39. Sum of E-State of atom type: sSiH3 (

S39

)

40. Sum of E-State of atom type: ssSiH2 (

S40

)

41. Sum of E-State of atom type: sssSiH (

S41

)

42. Sum of E-State of atom type: ssssSi (

S42

)

43. Sum of E-State of atom type: sPH2 (

S43

)

44. Sum of E-State of atom type: ssPH (

S44

)

45. Sum of E-State of atom type: sssP (

S45

)

46. Sum of E-State of atom type: dsssP (

S46

)

47. Sum of E-State of atom type: sssssP (

S47

)

48. Sum of E-State of atom type: sSH (

S48

)

49. Sum of E-State of atom type: dS (

S49

)

50. Sum of E-State of atom type: ssS (

S50

)

51. Sum of E-State of atom type: aaS (

S51

)

52. Sum of E-State of atom type: dssS (

S52

)

53. Sum of E-State of atom type: ddssS (

S53

)

54. Sum of E-State of atom type: sCl (

S54

)

55. Sum of E-State of atom type: sGeH3 (

S55

)

56. Sum of E-State of atom type: ssGeH2 (

S56

)

57. Sum of E-State of atom type: sssGeH (

S57

)

58. Sum of E-State of atom type: ssssGe (

S58

)

59. Sum of E-State of atom type: sAsH2 (

S59

)

60. Sum of E-State of atom type: ssAsH (

S60

)

61. Sum of E-State of atom type: sssAs (

S61

)

62. Sum of E-State of atom type: sssdAs (

S62

)

63. Sum of E-State of atom type: sssssAs (

S63

)

64. Sum of E-State of atom type: sSeH (

S64

)

65. Sum of E-State of atom type: dSe (

S65

)

66. Sum of E-State of atom type: ssSe (

S66

)

67. Sum of E-State of atom type: aaSe (

S67

)

68. Sum of E-State of atom type: dssSe (

S68

)

69. Sum of E-State of atom type: ddssSe (

S69

)

70. Sum of E-State of atom type: sBr (

S70

)

71. Sum of E-State of atom type: sSnH3 (

S71

)

72. Sum of E-State of atom type: ssSnH2 (

S72

)

73. Sum of E-State of atom type: sssSnH (

S73

)

74. Sum of E-State of atom type: ssssSn (

S74

)

75. Sum of E-State of atom type: sI (

S75

)

76. Sum of E-State of atom type: sPbH3 (

S76

)

77. Sum of E-State of atom type: ssPbH2 (

S77

)

78. Sum of E-State of atom type: sssPbH (

S78

)

79. Sum of E-State of atom type: ssssPb (

S79

)

80-158. maximum of E-State value of specified atom type (

Smax1~Smax79

)

159-237. minimum of E-State value of specified atom type (

Smin1~Smin79

)

Introduction:

The E-State value for a given non-hydrogen atom

in a molecule is given by its intrinsic state (

) plus

the sum of the perturbations on that atom from all the other atoms in the molecule:

kkki

SII

==+∆

∑

w here the intrinsic state (

) is given by

2(2/)1

w here N=principle quantum number (which is equal to the element

’

s period or row in the element

table).

The perturbation of atom

due to atom

is given by

()

III

−∆=

w here

kiki

is the number of bonds that separate atom

from atom

The atom type non hydrogen indices (SX) are obtained by summing the E-State values for all the atoms

of a given type

that are present in the molecule.

()

SXSt

∑In addition, the symbol present in molecular descriptors,

and

indicate single bond, double bond,

triple bond and aromatic bond, respectively.

1. 8

8 Autocorrelation

Autocorrelation

Autocorrelation descriptors

descriptors

The Broto-Moreau autocorrelation descriptors (ATSdw) are given by

ijij

ATSdw δωω

∑∑

w here

is the considered topological distance (i.e. the lag in the autocorrelation terms), d

is the

Kronecker delta function ( d

=1 if

=d, zero otherwise), and

and

are the weights (normalized

atomic properties) for atoms

and

respectively. The normalized atomic mass, van der Waals volume,

electronegativity, or polarizability can be used for the weights.

match Dragon, the Broto-Moreau

autocorrelation descriptors are calculated in the Software as follows:

The Moran autocorrelation descriptors (MATSdw) are given by

w here

is the average value of the property for the molecule and △is the number of vertex pairs at

distance equal to

The Geary autocorrelation descriptors are given by

The 2D autocorrelation descriptors are described on page17-19 of the Handbook of Molecular

Descriptors.

1. 8

8 .1

.1 Moreau-Broto

Moreau-Broto

Moreau-Broto autocorrelation

autocorrelation

autocorrelation descriptors

descriptors

1. Broto-Moreau autocorrelation of a topological structure-lag1/weighted by atomic masses (

ATSm1

)

2. Broto-Moreau autocorrelation of a topological structure-lag2/weighted by atomic masses (

ATSm2

)

3. Broto-Moreau autocorrelation of a topological structure-lag3/weighted by atomic masses (

ATSm3

)

4. Broto-Moreau autocorrelation of a topologicalstructure-lag4/weighted by atomic masses (

ATSm4

)

5. Broto-Moreau autocorrelation of a topological structure-lag5/weighted by atomic masses (

ATSm5

)

6. Broto-Moreau autocorrelation of a topological structure-lag6/weighted by atomic masses (

ATSm6

)

7. Broto-Moreau autocorrelation of a topological structure-lag7/weighted by atomic masses (

ATSm7

)

8. Broto-Moreau autocorrelation of a topological structure-lag8/weighted by atomic masses (

ATSm8

)

9. Broto-Moreau autocorrelation of a topological structure-lag1/weighted by atomic van der Waals

volumes (

ATSv1

)

10. Broto-Moreau autocorrelation of a topological structure-lag2/weighted by atomic van der Waals

volumes (

ATSv2

)

11. Broto-Moreau autocorrelation of a topological structure-lag3/weighted by atomic van der Waals

volumes (

ATSv3

)

12. Broto-Moreau autocorrelation of a topological structure-lag4/weighted by atomic van der Waals

volumes (

ATSv4

)

13. Broto-Moreau autocorrelation of a topological structure-lag5/weighted by atomic van der Waals

volumes (

ATSv5

)

14. Broto-Moreau autocorrelation of a topological structure-lag6/weighted by atomi van der Waals

volumes (

ATSv6

)

15. Broto-Moreau autocorrelation of a topological structure-lag7/weighted by atomic van der Waals

volumes (

ATSv7

)

16. Broto-Moreau autocorrelation of a topological structure-lag8/weighted by atomic van der Waals

volumes (

ATSv8

)

17. Broto-Moreau autocorrelation of a topological structure-lag1/weighted by atomic Sanderson

electronegativities (

ATSe1

)

18. Broto-Moreau autocorrelation of a topological structure-lag2/weighted by atomic Sanderson

electronegativities (

ATSe2

)

19. Broto-Moreau autocorrelation of a topological structure-lag3/weighted by atomic Sanderson

electronegativities (

ATSe3

)

20. Broto-Moreau autocorrelation of a topological structure-lag4/weighted by atomic Sanderson

electronegativities (

ATSe4

)

21. Broto-Moreau autocorrelation of a topological structure-lag5/weighted by atomic Sanderson

electronegativities (

ATSe5

)

22. Broto-Moreau autocorrelation of a topological structure-lag6/weighted by atomic Sanderson

electronegativities (

ATSe6

)

23. Broto-Moreau autocorrelation of a topological structure-lag7/weighted by atomic Sanderson

electronegativities (

ATSe7

)

24. Broto-Moreau autocorrelation of a topological structure-lag8/weighted by atomic Sanderson

electronegativities (

ATSe8

)

25. Broto-Moreau autocorrelation of a topological structure-lag1/weighted by atomic polarizabilities

(

ATSp1

)

26. Broto-Moreau autocorrelation of a topological structure-lag2/weighted by atomic polarizabilities

(

ATSp2

)

27. Broto-Moreau autocorrelation of a topological structure-lag3/weighted by atomic polarizabilities

(

ATSp3

)

28. Broto-Moreau autocorrelation of a topological structure-lag4/weighted by atomic polarizabilities

(

ATSp4

)

29. Broto-Moreau autocorrelation of a topological structure-lag5/weighted by atomic polarizabilities

(

ATSp5

)

30. Broto-Moreau autocorrelation of a topological structure-lag6/weighted by atomic polarizabilities

(

ATSp6

)

31. Broto-Moreau autocorrelation of a topological structure-lag7/weighted by atomic polarizabilities

(

ATSp7

)

32. Broto-Moreau autocorrelation of a topological structure-lag8/weightedbyatomic polarizabilities

(

ATSp8

)

1. 8

8 .2

.2 Moran

Moran

Moran autocorrelation

autocorrelation

autocorrelation descriptors

descriptors

33. Moran autocorrelation-lag1/weighted by atomic masses (

MATSm1

)

34. Moran autocorrelation-lag2/weighted by atomic masses (

MATSm2

)

35. Moran autocorrelation-lag3/weighted by atomic masses (

MATSm3

)

36. Moran autocorrelation-lag4/weighted by atomic masses (

MATSm4

)

37. Moran autocorrelation-lag5/weighted by atomic masses (

MATSm5

)

38. Moran autocorrelation-lag6/weighted by atomic masses (

MATSm6

)

39. Moran autocorrelation-lag7/weighted by atomic masses (

MATSm7

)

40. Moran autocorrelation-lag 8/weighted by atomic masses (

MATSm8

)

41. Moran autocorrelation-lag1/weighted by atomic van der Waals volumes (

MATSv1

)

42. Moran autocorrelation-lag2/weighted by atomic van der Waals volumes (

MATSv2

)

43. Moran autocorrelation-lag3/weighted by atomic van der Waals volumes (

MATSv3

)

44. Moran autocorrelation-lag4/weighted by atomic van der Waals volumes (

MATSv4

)

45. Moran autocorrelation-lag5/weighted by atomic van der Waals volumes (

MATSv5

)

46. Moran autocorrelation-lag6/weighted by atomic van der Waals volumes (

MATSv6

)

47. Moran autocorrelation-lag7/weighted by atomic van der Waals volumes (

MATSv7

)

48. Moran autocorrelation-lag8/weighted by atomic van der Waals volumes (

MATSv8

)

49. Moran autocorrelation-lag1/weighted by atomic Sanderson electronegativities (

MATSe1

)

50. Moran autocorrelation-lag2/weighted by atomic Sanderson electronegativities (

MATSe2

)

51. Moran autocorrelation-lag3/weighted by atomic Sanderson electronegativities (

MATSe3

)

52. Moran autocorrelation-lag4/weighted by atomic Sanderson electronegativities (

MATSe4

)

53. Moran autocorrelation-lag5/weighted by atomic Sanderson electronegativities (

MATSe5

)

54. Moran autocorrelation-lag6/weighted by atomic Sanderson electronegativities (

MATSe6

)

55. Moran autocorrelation-lag7/weighted by atomic Sanderson electronegativities (

MATSe7

)

56. Moran autocorrelation-lag8/weighted by atomic Sanderson electronegativities (

MATSe8

)

57. Moran autocorrelation-lag1/weighted by atomic polarizabilities (

MATSp1

)

58. Moran autocorrelation-lag2/weighted by atomic polarizabilities (

MATSp2

)

59. Moran autocorrelation-lag3/weighted by atomic polarizabilities (

MATSp3

)

60. Moran autocorrelation-lag4/weighted by atomic polarizabilities (

MATSp4

)

61. Moran autocorrelation-lag5/weighted by atomic polarizabilities (

MATSp5

)

62. Moran autocorrelation-lag6/weighted by atomic polarizabilities (

MATSp6

)

63. Moran autocorrelation-lag7/weighted by atomic polarizabilities (

MATSp7

)

64. Moran autocorrelation-lag8/weighted by atomic polarizabilities (

MATSp8

)

1. 8

8 .3

.3 Geary

Geary

Geary autocorrelation

autocorrelation

autocorrelation descriptors

descriptors

65. Geary autocorrelation-lag1/weighted by atomic masses (

GATSm1

)

66. Geary autocorrelation-lag2/weighted by atomic masses (

GATSm2

)

67. Geary autocorrelation-lag3/weighted by atomic masses (

GATSm3

)

68. Geary autocorrelation-lag4/weighted by atomic masses (

GATSm4

)

69. Geary autocorrelation-lag5/weighted by atomic masses (

GATSm5

)

70. Geary autocorrelation-lag6/weighted by atomic masses (

GATSm6

)

71. Geary autocorrelation-lag7/weighted by atomic masses (

GATSm7

)

72. Geary autocorrelation-lag8/weighted by atomic masses (

GATSm8

)

73. Geary autocorrelation-lag1/weighted by atomic van der Waals volumes (

GATSv1

)

74. Geary autocorrelation-lag2/weighted by atomic van der Waals volumes (

GATSv2

)

75. Geary autocorrelation-lag3/weighted by atomic van der Waals volumes (

GATSv3

)

76. Geary autocorrelation-lag4/weighted by atomic van der Waals volumes (

GATSv4

)

77. Geary autocorrelation-lag5/weighted by atomic van der Waals volumes (

GATSv5

)

78. Geary autocorrelation-lag6/weighted by atomic van der Waals volumes (

GATSv6

)

79. Geary autocorrelation-lag7/weighted by atomic van der Waals volumes (

GATSv7

)

80. Geary autocorrelation-lag8/weighted by atomic van der Waals volumes (

GATSv8

)

81. Geary autocorrelation-lag1/weighted by atomic Sanderson electronegativities (

GATSe1

)

82. Geary autocorrelation-lag2/weighted by atomic Sanderson electronegativities (

GATSe2

)

83. Gearyautocorrelation-lag3/weighted by atomic Sanderson electronegativities (

GATSe3

)

84. Geary autocorrelation-lag4/weighted by atomic Sanderson electronegativities (

GATSe4

)

85. Geary autocorrelation-lag5/weighted by atomic Sanderson electronegativities (

GATSe5

)

86. Geary autocorrelation-lag6/weighted by atomic Sanderson electronegativities (

GATSe6

)

87. Geary autocorrelation-lag7/weighted by atomic Sanderson electronegativities (

GATSe7

)

88. Geary autocorrelation-lag8/weighted by atomic Sanderson electronegativities (

GATSe8

)

89. Geary autocorrelation-lag1/weighted by atomic polarizabilities (

GATSp1

)

90. Geary autocorrelation-lag2/weighted by atomic polarizabilities (

GATSp2

)

91. Geary autocorrelation-lag3/weighted by atomic polarizabilities (

GATSp3

)

92. Geary autocorrelation-lag4/weighted by atomic polarizabilities (

GATSp4

)

93. Geary autocorrelation-lag5/weighted by atomic polarizabilities (

GATSp5

)

94. Geary autocorrelation-lag6/weighted by atomic polarizabilities (

GATSp6

)

95. Geary autocorrelation-lag7/weighted by atomic polarizabilities (

GATSp7

)

96. Geary autocorrelation-lag8/weighted by atomic polarizabilities (

GATSp8

)

1. 9

9 Charge

Charge

Charge descriptors

descriptors

1. Most positive charge on H atoms (

Hmax

)

2. Most positive charge on C atoms (

Cmax

)

3. Most positive charge on N atoms (

Nmax

)

4. Most positive charge on O atoms (

Omax

)

5. Most negative charge on H atoms (

Hmin

)

6. Most negative charge on C atoms (Q

Cmin

)

7. Most negative charge on N atoms (Q

Nmin

)

8. Most negative charge on O atoms (Q

Omin

)

9. Most positive charge in a molecule (

max

)

10. Most negative charge in a molecule (

min

)

11. Sum of squares of charges on H atoms (

HSS

)

12. Sum of squares of charges on C atoms (

CSS

)

13. Sum of squares of charges on N atoms (

NSS

)

14. Sum of squares of charges on O atoms (

OSS

)

15. Sum of squares of charges on all atoms (

aSS

)

16. Mean of positive charges (

Mpc

)

17. Total of positive charges (

Tpc

)

18. Mean of negative charges (

Mnc

)

19. Total of negative charges (

Tnc

)

20. Mean of absolute charges (

Mac

)

21. Total of absolute charges (

Tac

)

22. Relative positive charge (

Rpc

)

23. Relative negative charge (

Rnc

)

24. Submolecular polarity parameter (

SPP

)

25. Local dipole index (

LDI

)

I ntroduction:

ntroduction:

These are electronic descriptors defined in terms of atomic charges and used to describe electronic

aspects of the whole molecule and of particular regions, such as atoms, bonds and molecular fragments.

Charge descriptors are calculated by computational chemistry and therefore can be considered among

quantum chemical descriptors.

Electrical charges in the molecule are the driving force of electrostatic interactions, and it is well

known the local electron density or charge plays a fundamental role in many chemical reactions and

physic-chemical properties.

Some most used charge descriptors are displayed here as followed :

(1) Most positive charge in a molecule (

max

)

The maximum positive charge of the atoms in a molecule:

max

max()

aaQq

w here q

are net atom positive charges

(2) Most negative charge in a molecule (

min

)

The maximum negative charge of the atoms in a molecule:

min

max()

aaQq

−=

w here q

are net atom negative charges

(3) Total of positive charges (

Tpc

)

The sum of all of the positive charges of the atoms in a molecule:

()

aaTpcq

∑w here q

are net atom positive charges

(4) Total of negative charges (

Tnc

)

The sum of all of the negative charges of the atoms in a molecule:

()

aaTncq

−=

∑w here q

are net atom negative charges

1. 10

10 molecular

molecular

molecular properties

properties

1. Molar refractivity (

MREF

)

2. LogP value based on the Crippen method (

logP

)

3. Square of LogP value based on the Crippen method (

logP2

)

4. Topological polarity surface area (

TPSA

)

5. Unsaturation index (

)

6. Hydrophilic index (

)

Introduction:

(1) Molar refractivity (

MREF

)

Molecular descriptor of a liquid which contains both information about molecular volume and

polarizability, usually defined by the Lorenz-Lorentz equation:

nMW

nρ

−=

w here MW is the molecular weight,

is the liquid density, and n the refractive index of the

liquid.

（2）LogP value based on the Crippen method (

logP

)

The Ghose-Crippen contribution method is based on hydrophobic atomic constants

measuring the lipophilic contributions of atoms in the molecule, each described by its

neighbouring atoms.

kLogPaN

∑w here

is the occurrence of the

kth

atom type

（

）

Topological polarity surface area (

TPSA

)

It is the sum of solvent-accessible surface areas of atoms with absolute value of partial charges

greater than or equal to 0.2.

0.2

TPSASA

≥

∑

（4）Unsaturation index (

)

The unsaturation index (

) is defined as

2log(1)

UInDBnTBnAB

=+++

w here nDB=the number of double bonds, nTB=the number of triple bonds and nAB=the

number of aromatic bonds. The unsaturation index is described in the user manual for Dragon .

(5) Hydrophilic index (

)

The hydrophilic index is given by

11(1)log(1)(log)

log(1)

HyHyc

NNNN

AAAHy

++++

w here

is the number of hydrophilic groups (or the total number of hydrogen attached to

oxygen, sulfur and nitrogen atoms),

is the number of carbon atoms, and

is the number of

non hydrogen atoms. The hydrophilic index is described in more detail on page 225 of the

Handbook of Molecular Descriptors (Todeschini and Consonni 2000).

1. 11

11 MOE-type

MOE-type

MOE-type descriptors

descriptors

1. topological polar surface area based on fragments (

TPSA

)

2. Labute's Approximate Surface Area (

LabuteASA

)

3. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 1

)

4. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 2

)

5. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 3

)

6. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 4

)

7. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 5

)

8. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 6

)

9. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 7

)

10. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 8

)

11. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 9

)

12. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 10

)

13. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA

)

14. MOE-type descriptors using SLogP contributions and surface area contributions (

SLOGPVSA 12

)

15. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 1

)

16. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 2

)

17. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 3

)

18. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 4

)

19. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 5

)

20. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 6

)

21. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 7

)

22. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 8

)

23. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 9

)

24. MOE-type descriptors using MR contributions and surface area contributions (

SMRVSA 10

)

25. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 1

)

26. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 2

)

27. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 3

)

28. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 4

)

29. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 5

)

30. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 6

)

31. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 7

)

32. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 8

)

33. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 9

)

34. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 10

)

35. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA

)

36. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 12

)

37. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 13

)

38. MOE-type descriptors using partial charges and surface area contributions (

PEOEVSA 14

)

39. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 1

)

40. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 2

)

41. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 3

)

42. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 4

)

43. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 5

)

44. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 6

)

45. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 7

)

46. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 8

)

47. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 9

)

48. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA 10

)

49. MOE-type descriptors using Estate indices and surface area contributions (

EstateVSA

)

50. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 1

)

51. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 2

)

52. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 3

)

53. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 4

)

54. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 5

)

55. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 6

)

56. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 7

)

57. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 8

)

58. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 9

)

59. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate 10

)

60. MOE-type descriptors using surface area contributions and Estate indices (

VSAEstate

)

1.1

1.1 2

2 Molecular

Molecular

Molecular fingerprint

fingerprint

Molecular fingerprints are string representations of chemical structures designed to enhance the

efficiency of chemical database searching and analysis. They can encode the 2D and/or 3D features of

molecules as an array of binary values or counts. Therefore, molecular fingerprints consist of bins, each

bin being a substructure descriptor associated with a specific molecular feature.

Molecular fingerprint s directly encode molecular structure in a series of binary bits that represent

the presence or absence of particular substructures in the molecule . Although it divides the whole

molecule into a large number of fragments, it has the potential to keep overall complexity of drug

molecules. Additionally, it does not need reasonable three-dimensional conformation of drug molecules

and thereby does not lead to error accumulation from the description of molecular structures. T hus by

means of such descriptors, each molecule can be described based on a set of fingerprints of structural

keys, which is represented as a Boolean array.

SMARTS list of substructure patterns is first

determined as a predefined dictionary. T here is a one-to-one correspondence between each SMARTS

pattern and bit in the fingerprint. F or each SMARTS pattern, if its corresponding substructure is present

in the given molecule, the corresponding bit in the fingerprint is set to 1; conversely, it is set to 0 if the

substructure is absent in the molecule (see Figure 1). Note that different molecular fingerprint systems

abstract and magnify different aspects of molecular topology.

Figure

Figure 1

1Representation of a molecular substructure fingerprint with a substructure fingerprint

dictionary of given substructure patterns. This molecule is represented in a series of binary bits that

represent the presence or absence of particular substructures in the molecules. This Figure is from Ref.

2 in section 3 and 4.

1.1

1.1 2

2 .1

.1 Daylight-type

Daylight-type

Daylight-type fingerprint

fingerprint

The Daylight fi ngerprints (DFP) are hashed fi ngerprints encoding each atom type, all Augmented

Atoms and all paths of length 2 – 7 atoms, giving a total string of 1024 bits [Daylight-James, Weininger

et al., 1997 ].

1.1

1.1 2

2 .2

.2 MACCS

MACCS

MACCS keys

keys

keys and

and

and FP4

FP4

FP4 fingerprint

fingerprint

The FP4 and MACCS fingerprint s are used to construct the substructure dictionaries, respectively.

T he dictionary of FP4 fingerprint contains 307 mostly common substructure patterns. I t is originally

written in an attempt to represent the classification of organic compounds from the viewpoint of an

organic chemist. The MACCS fingerprint uses a dictionary of MDL keys, which contains a set of 166

mostly common substructure features. These are referred to as the MDL public MACCS keys. Both the

definitions of FP4 and MACCS fingerprints are available from OpenBabel (version 2.3.0,

http://openbabel.org/ , accessed October , 2010 ) . All calculations for these substructure fingerprints are

performed in PyDPI package, developed by our group.

1.1

1.1 2

2 .3

.3 E-state

E-state

E-state fingerprint

fingerprint

Electrotopological State (E-state) fingerprints represent the presence/absence of 79 E-state

substructures defined Kier and Hall in a molecule. The definition of 79 atom types can be found in

section 1.5.

1.1

1.1 2

2 .4

.4 Atom

Atom

Atom pairs

pairs

pairs and

and

and topological

topological

topological torsions

torsions

torsions fingerprints

fingerprints

Atom

Atom pairs

pairs

pairs fingerprint:

fingerprint:

A tom pairs are substructure descriptors de fi ned in terms of any pair of atoms and bond types

connecting them. An atom pair is composed of two non-hydrogen atoms and an interatomic separation:

{ }

AP[th atom description][separation][th a tom description]

The two considered atoms need not be directly connected and the separation can be the topological

distance between them [Carhart, Smith et al., 1985]; these descriptors are usually called topological

atom pairs being based on the topological representation of the molecules. Atom type is de fi ned by the

element itself, the number of heavy-atom connections and number of p electron pairs on each atom.

Unlike topological torsions, atom pairs are sensitive to long-range correlations between the atoms in

molecules and therefore to small changes in one part of even large molecules. Atom pair descriptors

usually are Boolean variables encoding the presence or absence of a particular atom pair in each

molecule.

Topological

Topological torsion

torsion

torsion fingerprint:

fingerprint:

The topological torsion descriptor (TT) is related to the 4-atom linear subfragment descriptor of

Klopman because it is de fi ned as a Boolean variable for the presence/absence of a linear sequence of

four consecutively bonded non-hydrogen atoms

k – i – j – l

, each described by its atom type (TYPE), the

number of

electrons (NPI) on each atom, and the number of non-hydrogen atoms (NBR) bonded to it

[Nilakantan, Bauman et al., 1987]. Usually NBR does not include

k – i – j – l

atoms that go to make the

torsion itself; therefore, it is - 1 for k and l atoms and - 2 for the two central atoms

and

. The torsion

around the

i - j

bond and de fi ned by the four indices

k – i – j – l

is represented by the following TT

descriptor:

The TT descriptor is a topological analogue of the 3D torsion angle, de fi ned by four consecutively

bonded atoms. The topological torsion is a short-range descriptor, that is, it is sensitive only to local

changes in the molecule and is independent of the total number of atoms in the molecule.

The use of atom-centered fragments and related descriptors greatly increases the speci fi c chemical

information concerning different functional groups, but cannot discriminate between different

arrangements of functional groups within a molecule.

1.1

1.1 2

2 .5

.5 Morgan

Morgan

Morgan fingerprint

fingerprint

This family of fingerprints, better known as circular fingerprints, is built by applying the Morgan

algorithm to a set of user-supplied atom invariants. When generating Morgan fingerprints, the radius of

the fingerprint need be provided. For detailed information about Morgan fingerprint, please refer to Ref.

[19]. Note the default atom invariants use connectivity information similar to those used for the well

known ECFP family of fingerprints. When comparing the ECFP/FCFP fingerprints and the Morgan

fingerprints generated by the PyDPI, remember that the 4 in ECFP4 corresponds to the diameter of the

atom environments considered, while the Morgan fingerprints take a radius parameter. So the examples

above, with radius=2, are roughly equivalent to ECFP4 and FCFP4.

References:

[1] Aguiara, P.F.d., Bourguignon, B., Khotsa, M.S., Massarta, D.L., and Phan-Than-Luub, R.

D-optimal designs. Chemometrics and Intelligent Laboratory Systems . 1995, 30 , 199-210.

[2] Daylight Chemical Information Systems, Inc. Simplified Molecular Input Line Entry System. 2006,

http://www.daylight.com/smiles/index.html.

[3] Elsevier MDL. MDL QSAR Version 2.2. 2006,

http://www.mdl.com/products/predictive/qsar/index.jsp.

[4] Ghose, A.K., Viswanadhan,V. N., and Wendoloski, J.J. Prediction of Hydrophilic (Lipophilic)

Properties of Small Organic Molecules Using Fragmental Methods: An analysis of ALOG an CLOGP

Methods. J. Phys. Chem. 1998 , 102 , 3762-3772.

[5] Gramatica, P., Corradi, M., and Consonni, V. Model ligand Prediction of Soil Sorption Coefficients

of Non-ionic Organic Pesticides by Molecular Descriptors. Chemosphere 2000 , 41 , 763-777.

[6] Hall, L.H., and Kier, L.B. The Molecular Connectivity Chi Indices and Kappa Shape Indices in

Structure-Property Relations. In Reviews of Computational Chemistry, edited by D. Boyd and K.

Lipkowitz. New York: VCH Publishers, Inc., 1991 , 367-422.

[7] Hall, L.H., and Kier, L.B. Molecular Connectivity Chi Indices for Database Analysis and

Structure-Property Modeling. In Methods for QSAR Modelling, edited by J. Devillers. 1999

[8] Kier,L.B. Inclusion of symmetry as a shape attribute in Kappa index analysis. Quantit. Struct.-Act.

Relat. 1987 , 6 , 8-12.

[9] Kier, L.B., and Hall, L.H. Molecular Connectivity in Chemistry and Drug Research. 1976 , New

York: Academic Press Inc.

[10] Kier, L.B.,and Hall, L.H. Molecular Connectivity in Structure-Activity Analysis. 1986 , New York:

John Wiley and Sons.

[11] Kier,L.B., and Hall, L.H. Molecule Structure Description: The Electrotopological State. 1999 ,

New York: Academic Press.

[12] Martin, T.M., Harten, P., Venkatapathy, R., Das, S., and Young, D.M. A Hierarchical Clustering

Methodology for the Estimation of Toxicity. Toxicology Mechanisms and Methods 2008 , 18 , 251-266.

[13] JAMA : A Java Matrix Package. 2005, http://math.nist.gov/javanumerics/jama/ .

[14] Talete. Dragon Version 5.4. 2006 , http://www.talete.mi.it/dragon_net.htm .

Todeschini, R., and Consonni, V. Handbook of Molecular Descriptors. 2000 , Weinheim, Germany:

Wiley-VCH.

[15] Viswanadhan, V.N., Ghose, A.K., Revankar, G. R., and Robins, R.K. Atomic Physicochemical

Parameters for Three Dimensional Structure Directed Quantitative Structure-Activity Relationships. 4.

Additional Parameters for Hydrophobic and Dispersive Interactions and Their Application for an

Automated Superposition of Certain Naturally Occurring Nucleoside Antibiotics. J. Chem. Inf. Comput.

Sci. 1989 , 29 , 163-172.

[16] Wang, R., Gao, Y., and Lai, L. Calculating partition coefficient by atom-additive method.

Perspectives in Drug Discovery and Design 2000 , 19 , 47-66.

[17] R. E. Carhart, D.H. Smith, R. Venkataraghavan. Atom Pairs as Molecular Features in

Structure-Activity Studies: Definition and Applications. J. Chem. Inf. Comput. Sci . 1985, 265, 64-73.

[18] R. Nilakantan, N. Bauman, J.S. Dixon, R. Venkataraghavan. Topological Torsions: A New

Molecular Descriptor for SAR Applications. Comparison with Other Descriptors. J. Chem. Inf. Comput.

Sci . 1987, 27, 82-85.

[19] David Rogers, Mather Hahn. Extended-Connectivity Fingerprints. J. Chem. Inf. Comput. Sci .

2010, 50, 742-754.

[20] Paul Labute. A widely applicable set of descriptors. Journal of Molecular Graphics and Modeling.

2000, 18, 464-477.

[21]

C. A. James , D. Weininger , J. Delany, Daylight Theory Manual 1997,

http://www

daylight

com/dayhtml/doc/theory/theory.toc.html .

[22] Burden, F.R. A chemically intuitive molecular index based on the eigenvalues of a modi fi ed

adjacency matrix. Quant. Struct. -Act. Relat., 1997 , 16, 309 – 314

[2 3 ] Basak, S.C. Information theoretic indices of neighborhood complexity and their applications, in

Bibliography Topological Indices and Related Descriptors in QSAR and QSPR (eds J. Devillers and

A.T. Balaban), Gordon and Breach Science Publishers, Amsterdam, The Netherlands, 1999 , pp.

563 – 593.

2 Descriptors

Descriptors

Descriptors of

of proteins

proteins

proteins and

and

and peptides

peptides

A protein or peptide sequence with

amino acid residues is expressed as: R

, R

, … , R

, where R

represents the residue at the

-th position in the sequence. The labels

and

are used to index amino

acid position in a sequence and r, s are used to index the amino acid type. The computed features are

divided into 4 groups according to their known applications described in the literature.

A protein sequence can be divided equally into segments and the methods, described as follows for the

global sequence, can be applied to each segment.

2 .1

.1 Amino

Amino

Amino acid

acid

acid composition

composition

The amino acid composition is the fraction of each amino acid type within a protein. The fractions of

all 20 natural amino acids are calculated as:

()

rNfr

=1, 2, 3, ..., 20

Where

is the number of the amino acid type

and

is the length of the sequence.

2 .2

.2 D

D ipeptide

ipeptide

ipeptide composition

composition

The dipeptide composition gives 400 features, defined as:

(,)

rsNfrs

−

=1, 2, 3, ..., 20

where

is the number of dipeptide represented by amino acid type

and

2.3

2.3 Tripeptide

Tripeptide

Tripeptide composition

composition

The tripeptide composition gives 8000 features, defined as:

(,,)

rstNfrst

−

=1, 2, 3, ..., 20

where

rst

is the number of tripeptide represented by amino acid type

and

2 .

. 4

4 Autocorrelation

Autocorrelation

Autocorrelation descriptors

descriptors

Autocorrelation descriptors are defined based on the distribution of amino acid properties along the

sequence. The amino acid properties used here are various types of amino acids index

(http://www.genome.ad.jp/dbget/aaindex.html ) .Three type of autocorrelation descriptors are used here

and are described as following.

All the amino acid indices are centralized and standardized before the calculation, i.e.

−=

Where

is the average of the property of the 20 amino acids .

∑

and

()

PPσ

==−

∑

2 .

. 4

4 .1

.1 Normalized

Normalized

Normalized Moreau-Broto

Moreau-Broto

Moreau-Broto autocorrelation

autocorrelation

autocorrelation descriptors

descriptors

Moreau-Broto autocorrelation descriptors application to protein sequences may be defined as:

1()

iid

ACdPP

−

∑

=1, 2, 3, ..., nlag

Where

is called the lag of the autocorrelation and

and

are the properties of the amino acids at

position

and

, respectively.

nlag

is the maximum value of the lag.

The normalized Moreau-Broto autocorrelation descriptors are defined as:

()()

ACdATSd

−

=1, 2, 3, ...,

nlag

Figure

Figure 2

2An illustrated example in the AAIndex database

2 .

. 4

4 .2

.2 Moran

Moran

Moran autocorrelation

autocorrelation

Moran autocorrelation descriptors application to protein sequence may be defined as:

()()

()

iid

PPPP

NdId

PPN

−

−−

−=

−

∑

=1, 2, 3, ..., 30.

Where

and

are defined in the same way as in 2.2.1, and is the average of the considered

property

along the sequence, i.e.,

∑

Where

and

nlag

have the same meaning as in the above.

2 .

. 4

4 .3

.3 Geary

Geary

Geary autocorrelation

autocorrelation

autocorrelation Descriptors

Descriptors

Geary autocorrelation descriptors application to protein sequence may be defined as:

()2()

()

iid

PPNdCd

PPN

−

−−=

−−

∑

=1, 2, 3, ..., 30.

Where

and

nlag

have the same meaning as in the above.

The amino acid indices used in these autocorrelation descriptors can be specified in file

“ input-param.dat ” from “ input-aaindexdb.dat ” .

For each amino acid index, there will be 3 ×

nlag

autocorrelation descriptors.

2 .

. 5

5 Composition,

Composition,

Composition, transition

transition

transition and

and

and distribution

distribution

These descriptors are developed by Dubchak, et.al.

Figure

Figure 3

3The sequence of a hypothetic protein indicating the construction of composition, transition

and distribution descriptors of a protein. Sequence index indicates the position of an amino acid in the

sequence. The index for each type of amino acids in the sequence ( ‘ 1

’

‘ 2

’

or ‘ 3 ’ ) indicates the position

of the first, second, third, ... of that type of amino acid. 1/2 transition indicates the position of ‘ 12

’

‘ 21

’

pairs in the sequence (1/3 and 2/3 are defined in the same way.). This figure is from Ref. 2 in

section 3 and 4.

Step1.

Step1. Sequence

Sequence

Sequence encoding

encoding

The amino acids are divided in three classes according to its attribute and each amino acid is encoded

by one of the indices 1, 2, 3 according to which class it belonged. The attributes used here include

hydrophobicity, normalized van der Waals volume polarity, and polarizability, as in the references. The

corresponding division is in the table 1 .

Table

Table 1

1Amino acid attributes and the division of the amino acids into three groups for each attribute

Group

Group 1

1 Group

Group

Group 2

2 Group

Group

Group 3

hydrophobicity

Polar

R,K,E,D,Q,N

Neutral

G, A, S,T,P,H,Y

Hydrophobicity

C,L,V,I,M,F,W

normalized van

der Waals

volume

0-2.78

G,A,S,T,P,D

2.95-4.0

N,V,E,Q,I,L

4.03-8.08

M,H,K,F,R,Y ,W

polarity

4.9-6.2

L,I,F,W,C,M,V,Y

8.0-9.2

P,A,T,G,S

10.4-13.0

H,Q,R,K,N,E,D

polarizability

0-1.08

G,A,S,D,T

0.128-0.186

C,P,N,V,E,Q,I,L

0.219-0.409

K,M,H,F,R,Y ,W

charge

Positive

K,R

Neutral

A,N,C,Q,G,H,I,L,M,F,P,S,T,W,Y,

Negative

D,E

secondary

structure

Helix

E,A,L,M,Q,K,R,H

Strand

V,I,Y,C,W,F,T

Coil

G,N,P,S,D

solvent

accessibility

Buried

A,L,F,C,G,I,V,W

Exposed

R,K,Q,E,N,D

Intermediate

M,S P,T,H,Y

For example, for a given sequence “ MTEITAAMVKELRESTGAGA ” , it will be encoded as

“ 32132223311311222222 ” according to its hydrophobicity division.

Step

Step 2:

2: Composition,

Composition,

Composition, Transition

Transition

Transition and

and

and Distribution

Distribution

Distribution descriptors

descriptors

Three descriptors, “ Composition (C) ” , “ Transition (T) ” , and “ Distribution (D) ” were calculated for a

given attribute as follows:

Composition:

Composition: It is the global percent for each encoded class in the sequence. In the above example

using h ydrophobicity division, the numbers for encoded classes “ 1 ” , “ 2 ” , “ 3 ” are 5, 10, 5 respectively,

so the compositions for them are 5/20=25%, 10/20=10%, and 5/20=25% respectively, where 20 is the

length of the protein sequence. Composition can be defined as:

1, 2, 3

Where

is the number of

in the encoded sequence and

is the length of the sequence.

Transition:

Transition: A transition from class 1 to 2 is the percent frequency with which 1 is followed by 2 or 2 is

followed by 1 in the encoded sequence. Transition descriptor can be calculated as:

rssr

−

="12", "13", "23"

Where

is the numbers of dipeptide encoded as “

” and “

” respectively in the sequence and

is the length of the sequence.

Distribution:

Distribution: The “ distribution ” descriptor describes the distribution of each attribute in the sequence.

There are five “ distribution ” descriptors for each attribute and they are the position percents in the

whole sequence for the first residue, 25% residues, 50% residues, 75% residues and 100% residues ,

respectively, for a specified encoded class. For example, there are10 residues encoded as “ 2 ” in the

above example, the positions for the first residue “ 2 ” , the 2th residue “ 2 ” (25%*10=2), the 5th

“ 2 ” residue (50%*10=5), the 7th “ 2 ” (75%*10=7) and the10th residue “ 2 ” (100%*10) in the

encoded sequence are 2, 5, 15, 17, 20 respectively, so the distribution descriptors for “ 2 ” are: 10.0

(2/20*100), 25.0 (5/20*100), 75.0 (15/20*100), 85.0 (17/20*100) , 100.0 (20/20*100), respectively.

2.6

2.6 Conjoint

Conjoint

Conjoint Triad

Triad

Triad Descriptors

Descriptors

Conjoint triad descriptors are proposed by J.W. Shen et.al. These conjoint triad features abstracts the

features of protein pairs based on the classification of amino acids. In this approach, each protein

sequence is represented by a vector space consisting of features of amino acids.

reduce the

dimensions of vector space, the 20 amino acids were clustered into several classes according to their

dipoles and volumes of the side chains. The conjoint triad features are calculated as follows:

Step

Step 1:

1: classification

classification

classification of

of amino

amino

amino acids

acids

Electrostatic and hydrophobic interactions dominate protein-protein interactions. These two kinds of

interactions may be reflected by the dipoles and volumes of the side chains of amino acids, respectively.

Accordingly, these two parameters were calculated, respectively, by using the density-functional theory

method B3LYP/6-31G and molecular modeling approach. Based on the dipoles and volumes of the side

chains, the 20 amino acids could be clustered into seven classes (See Table 2). Amino acids within the

same class likely involve synonymous mutations because of their similar characteristics.

Table

Table 2 Classification of amino acids based on dipoles and volumes of the side chains

Dipole scale (Debye): -, Dipole<1.0; +, 1.0<Dipole<2.0; ++, 2.0<Dipole<3.0; +++, Dipole>3.0; +'+'+', Dipole>3.0

with opposite orientation.

Volume scale ( Å

): -, Volume<50; +, Volume> 50.

Cys is separated from class 3 because

of its ability to form disulfide bonds. This table is from Ref. 13.

Step

Step 2:

2: Conjoint

Conjoint

Conjoint triad

triad

triad calculation

calculation

The conjoint triad descriptors considered the properties of one amino acid and its vicinal amino acids

and regarded any three continuous amino acids as a unit. Thus, the triads can be differentiated

according to the classes of amino acids, i.e., triads composed by three amino acids belonging to the

same classes, such as ART and VKS, could be treated identically.

conveniently represent a protein,

we first use a binary space ( V

V,F

F) to represent a protein sequence. Here, V

Vis the vector space of the

sequence features, and each feature

represents a sort of triad type; F

Fis the frequency vector

corresponding to V

V, and the value of the

th dimension of F

) is the frequency of type

appearing in

the protein sequence. For the amino acids that have been catalogued into seven classes, the size of V

should be 7

7; thus

= 1,2, ..., 343. The detailed description for ( V

V,F

F) is illustrated in Figure 3.

Clearly, each protein correlates to the length (number of amino acids) of protein. In general, a long

protein would have a large value of

, which complicates the comparison between two heterogeneous

proteins. Thus, we defined a new parameter,

, by normalizing

with the following equation.

123343123343

(min{,,,...,})/max{,,,...,}

dfffffffff

=−

The numerical value of

of each protein ranges from 0 to 1, which thereby enables the comparison

between proteins. Accordingly, we obtain another vector space (designated D

D) consisting of

represent protein

Figure

Figure 3

3Schematic diagram for constructing the vector space

(V,

F) of protein sequence. V is the vector space of the

sequence features; each feature (v

) represents a triad composed of three consecutive amino acids; F is the frequency

vector corresponding to

and the value of the

th dimension of F(f

) is the frequency that v

triad appeared in the

protein sequence. This figure is from Ref. 13.

2 .

. 7

7 Quasi-sequence-order

Quasi-sequence-order

Quasi-sequence-order Descriptors

Descriptors

The quasi-sequence-order descriptors are proposed by K.C. Chou, et.al. They are derived from the

distance matrix between the 20 amino acids.

2 .

. 7.

7. 1

1 Sequence-order-coupling

Sequence-order-coupling

Sequence-order-coupling numbers

numbers

The dth-rank sequence-order-coupling number is defined as:

()

diid

dτ

−

∑

=1, 2, 3, ... ,

maxlag

Where

i+d

is the distance between the two amino acids at position

and

i+d

Note:

m axlag

is the maximum lag and the length of the protein must be not less than

maxlag

2 .

. 7

7 .2

.2 Quasi-sequence-order

Quasi-sequence-order

Quasi-sequence-order (QSO)

(QSO)

(QSO) descriptors

descriptors

For each amino acid type, a quasi-sequence-order descriptor can be defined as :

maxlag

fw τ

+∑∑

=1, 2, 3, ... , 20

Where

is the normalized occurrence for amino acid type

and

is a weighting factor (

=0.1). These

are the first 20 quasi-sequence-order descriptors. The other 30 quasi-sequence-order are defined as:

maxlag

−

+∑∑

=21, 22, 23, ... , 20+

maxlag

In addition to Schneider-Wrede physicochemical distance matrix used by Chou et al, another

chemical distance matrix by Grantham is also used here.

Figure

Figure 4

4A schematic drawing to show (a) the 1st-rank, (b) the 2nd-rank, and (3) the 3rd-rank

sequence-order-coupling mode along a protein sequence. (a) Reflects the coupling mode between all

the most contiguous residues, (b) that between all the 2nd most contiguous residues, and (c) that

between all the 3rd most contiguous residues. This figure is from Ref. 4.

2 .

. 8

8 pseudo-amino

pseudo-amino

pseudo-amino acid

acid

acid composition

composition

composition (PAAC)

(PAAC)

This groups of descriptors are proposed by Kuo-chen Chou. PAAC descriptors

(http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/type1.htm ) are also called the type 1 pseudo-amino acid

composition. Let

()

oHi

()

oHi

, ()

oMi

(

=1,2,3, ..., 20) be the original hydrophobicity values, the

original hydrophilicity values and the original side chain masses of the 20 natural amino acids,

respectively. They are converted to following qualities by a standard conversion:

2020

1()()

20()

1[()()]

HiHi

−

∑

∑∑

()

oHi

and

()

oMi

are normalized as

()

and

()

in the same way.

Figure

Figure 5

5A schematic drawing to show (a) the first-tier, (b) the second - tier, and (3) the third-tier

sequence order correlation mode along a protein sequence. Panel ( a

a) reflects the correlation mode

between all the most contiguous residues, panel ( b

b) that between all the second-most contiguous

residues, and panel ( c

c) that between all the third-most contiguous residues. This figure is from Ref. 8.

Then, a correlation function can be defines as:

{ }

222

1122

1(,)()()()()()()

ijijijij

RRHRHRHRHRMRMR

⎡⎤⎡⎤⎡⎤

Θ=−+−+−

⎣⎦⎣⎦⎣⎦

This correlation function is actually an averaged value for the three amino acid properties:

hydrophobicity value, hydrophilicity value and side chain mass. Therefore we can extend this

definition of correlation function for one amino acid property or for a set of n amino acid properties.

For one amino acid property, the correlation can be defined as:

11(,)()()

ijij

RRHRHR

⎡⎤

Θ=−

⎣⎦

where

(

)

is the amino acid property of amino acid

after standardization.

For a set of n amino acid properties, it can be defined as: where

(

) is the

th property in the amino

acid property set for amino acid

1(,)()()

ijkikj

RRHRHR

⎡⎤

Θ=−

⎣⎦

∑

where Hk(Ri) is the kth property in the amino acid property set for amino acid Ri.

A set of descriptors called sequence order-correlated factors are defined as:

(,)

RRNθ

−

==Θ

−

∑

(,)

RRNθ

−

==Θ

−

∑

(,)

RRNθ

−

==Θ

−

∑

...

(,)

RRN

λλ

−

==Θ

−

∑

(<L) is a parameter to be chosen. Let

is the normalized occurrence frequency of the 20 amino acids

in the protein sequence, a set of 20+

descriptors called the pseudo-amino acid composition for a

protein sequence can be defines as:

fXc

+∑∑

(1<c<20)

wXc

−

+∑∑

(21<

<20+ λ )

where

is the weighting factor for the sequence-order effect and is set as

=0.05 in PyDPI as

suggested by Chou KC.

Note: the original hydrophobicity values for amino acids in PyDPI are different from the values by

Chou KC. In this updated version, the default values of amino acid properties are the values of Chou

KC. However, in the work of Chou KC, the definition for “ normalized occurrence frequency ” is not

given and in this work we define it as the occurrence frequency of amino acid in the sequence

normalized to 100% and hence our calculated values are not the same as values by them.

2 .

. 9

9 A

A mphiphilic

mphiphilic

mphiphilic pseudo-amino

pseudo-amino

pseudo-amino acid

acid

acid composition

composition

composition (APAAC)

(APAAC)

APAAC ( http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/type2.htm ) are also called type 2 pseudo-amino

acid composition. The definitions of these qualities are similar to the above PAAC descriptors. From

(

) and

(

) defined in eq. 16 and eq. 17, the hydrophobicity and hydrophilicity correlation functions

are defined respectively as:

,11

()()

ijHHiHj

,22

()()

ijHHiHj

From these qualities, sequence order factors can be defines as:

1,1

HNτ

−

∑

2,1

HNτ

−

∑

3,2

HNτ

−

∑

4,2

HNτ

−

∑

...

21,

λλ

−

−+

−

∑

λλ

−

∑

Figure

Figure 6

6A schematic diagram to show ( a1

a1 /a2

a2 ) the first-rank, ( b1

b1 /b2

b2 ) the second-rank and ( c1

c1 /c2

c2 ) the

third-rank sequence-order-coupling mode along a protein sequence through a

hydrophobicity/hydrophilicity correlation function, where

and

are given by Equation (3).

Panel (a1/a2) reflects the coupling mode between all the most contiguous residues, panel (b1/b2) that

between all the second-most contiguous residues and panel (c1/c2) that between all the third-most

contiguous residues. This figure is from Ref. 12.

Then a set of descriptors called “ Amphiphilic pseudo amino acid composition ” (APAAC) are defined

as:

202

fPc

+∑∑

<20

202

wPc

+∑∑

21<

<20+2

Where

is the weighting factor and is taken as

=0.5 in PyDPI as in the work of Chou KC.

References

References :

[1] M. Bhasin and G. P. S. Raghava. Classification of Nuclear Receptors Based on Amino Acid

Composition and Dipeptide Composition. J. Bio. Chem. 2004, 279, 23262.

[2] Inna Dubchak, Ilya Muchink, Stephen R. Holbrook and Sung-Hou Kim. Prediction of protein

folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA, 1995, 92,

8700-8704.

[3] Inna Dubchak, Ilya Muchink, Christopher Mayor, Igor Dralyuk and Sung-Hou Kim. Recognition of

a Protein Fold in the Context of the SCOP classification. Proteins: Structure, Function and Genetics,

1999, 35, 401-407.

[4] Kuo-Chen Chou. Prediction of Protein Subcellar Locations by Incorporating Quasi-Sequence-Order

Effect. Biochemical and Biophysical Research Communications 2000, 278, 477-483.

[5] Kuo-Chen Chou and Yu-Dong Cai. Prediction of Protein sub-cellular locations by

GO-FunD-PseAA predictor . Biochemical and Biophysical Research Communications, 2004, 320,

1236-1239.

[6] Gisbert Schneider and Paul wrede. The Rational Design of Amino Acid Sequences by Artificial

Neural Networks and Simulated Molecular Evolution: Do Novo Design of an Idealized Leader

Cleavage Site. Biophys Journal, 1994, 66, 335-344.

[7] Grantham, R. Amino acid difference formula to help explain protein evolution. Science, 1974, 185,

862-864

[8] Kuo-Chen Chou. Prediction of Protein Cellular Attributes Using Pseudo-Amino Acid Composition.

PROTEINS: Structure, Function, and Genetics, 2001, 43 , 246 – 255.

[9] Jiri Damborsky. Quantitative structure – function and structure – stability relationships of purposely

modi  ed proteins. Protein Engineering, 1998, 11, 21-30

[10] Hopp-Woods. Prediction of protein antigenic determinants from amino acid sequences. Proc. Natl.

Acad. Sci. 1981, 78, 3824-3828.

[11] http://www.csbio.sjtu.edu.cn/bioinf/PseAAC/

[12] Kuo-Chen Chou. Using amphiphilic pseudo amino acid composition to predict enzyme subfamily

classes. Bioinformatics, 2005, 21, 10-19.

[13] J.W. Shen, J. Zhang, X.M. Luo, W.L. Zhu, K.Q. Yu, K.X. Chen, Y.X. Li, H.L. Jiang. Predicting

protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. 2007, 104,

4337-4341.

[14] Z.R. Li, H.H. Lin, Y. Han, L. Jiang, X. Chen, Y.Z. Chen. PROFEAT: a web server for computing

structural and physicochemical features of proteins and peptides form amino acid sequence. Nucleic

Acids Research. 2006, 34, 32-37.

[15] H.B. Rao, F. Zhu, G.B. Yang, Z.R. Li, Y.Z. Chen. Update of PROFEAT: a web server for

computing structural and physicochemical features of proteins and peptides from amino acid sequence.

Nucleic Acids Research. 2011, 39, 385-390.

[16] Kawashima, S., Ogata, H., and Kanehisa, M.; AAindex: amino acid index database. Nucleic Acids

Res. 1999, 27, 368-369.

[17] Kawashima, S. and Kanehisa, M.; AAindex: amino acid index database. Nucleic Acids Res. 2000,

28, 374.

[18] Kawashima, S., Pokarowski, P., Pokarowska, M., Kolinski, A., Katayama, T., and Kanehisa, M.;

AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. 2008, 36, D202-D205.

[19] Chou, K.-C. , and Shen, H.-B. Cell-PLoc: a package of Web servers for predicting subcellular

localization of proteins in various organisms. Nat. Protocols, 2008, 3, 153-162.

3 Protein-protein

Protein-protein

Protein-protein interaction

interaction

interaction descriptors

descriptors

Let F

F a

a={ F

F a

a(i),i=1,2, … ,n} and F

F b

b={ F

F b

b(i),i=1,2, … ,n} are the two descriptor vectors for interaction

protein A and protein B, respectively, then there are 3 methods to construct the descriptor vector F

Ffor

A and B:

(1)

(1) Two vectors F

Fab and F

Fba with dimension of 2n are constructed: F

Fab = ( F

Fa, F

Fb ) for interaction

between protein A and protein B and F

Fba=( F

Fb, F

Fa ) for interaction between protein B and protein A.

(2)

(2) One vector F

Fwith dimension of 2n is constructed: F

F={ F

Fa(i)+ F

Fb(i), F

Fa(i) × F

Fb(i), i=1,2, … , n}.

(3)

(3) One vector F

Fwith dimension of n

is constructed by the tensor product: F

F={ F

F(k)= F

Fa(i) × F

Fb(j), i=1,

2, … , n, j=1, 2 , … , n, k=(i-1) × n+j}.

4 Protein-ligand

Protein-ligand

Protein-ligand interaction

interaction

interaction descriptors

descriptors

There are two methods for construction of descriptor vector F

Ffor protein-ligand interaction from the

protein descriptor vector F

( F

(i), i=1,n

) and ligand descriptor vector F

(i), i=1,n

(1)

(1) One vector V with dimension of np+nl are constructed: F

F=( F

)for interaction between protein P

and ligand L.

(2)

(2) One vector V with dimension of n

× n

is constructed by the tensor product: F

F={ f (k)= F

(i) × F

(j),

i=1,2, … , np, j=1,2, … ,n

, k=(i-1) × np+j}.

Figure

Figure 7

7The schematic diagram dealing with the drug-target interaction by using the chemogenomics

approach. This interaction can be considered as an event triggered by many factors influencing the

binding between this drug and this protein. Therefore it can be efficiently represented by

simultaneously considering the structure content from this drug and this protein under a common

chemogenomics representation framework. This figure is from Ref. 2.

References:

[1] Cao, D.-S., Liu, S., Xu, Q.-S., Lu, H.-M., Huang, J.-H., Hu, Q.-N. and Liang,

Y.-Z.

Large-scale

prediction of drug-target interactions using protein sequences and drug topological structures. Analytica

Chimica Acta, 2012, 752, 1-10.

[2]

Yu,

H., Chen, J., Xu, X., Li,

Y.,

Zhao, H., Fang,

Y.,

Li, X., Zhou,

W.,

Wang,

and Wang,

Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and

Pharmacological Data. PLoS ONE, 2012, 7, e37608.

[3] Cao, D.-S., Liang,

Y.-Z.,

Deng, Zhe, Hu, Q.-N., He Min, Xu, Q.-S., Zhou, G.-H., Zhang, L.-X.,

Deng, Z.-X., Liu Shao. Genome-Scale Screening of Drug-Target Associations Relevant to Ki Using a

Chemogenomics Approach. PLOS ONE, 2013, 8, e57680.

[4] Hiroaki

Y.,

Satoshi N., Hiromu,

T.,

Tomomi, L., Takatsugu, H., Takafumi, H., Teppei, O., Yohsuke,

M., Gozoh, t., Yasushi, O. Analysis of multiple compound-protein interactions reveals novel bioactive

molecules. Mol. Syst. Biol., 2011, 7, 472-483.

[5] J.W. Shen, J. Zhang, X.M. Luo, W.L. Zhu, K.Q. Yu, K.X. Chen, Y.X. Li, H.L. Jiang. Predicting

protein-protein interactions based only on sequences information. Proc. Natl. Acad. Sci. 2007, 104,

4337-4341.

Appendix:

Table

Table S1

S1 List of PyDPI computed features for protein sequences

Feature

Feature group

group

group Features

Features

Features Number

Number

Number of

of descriptors

descriptors

Amino acid composition Amino acid composition 20

Dipeptide composition 400

Tripeptide composition 8000

Autocorrelation Normalized Moreau-Broto

autocorrelation

240

Moran autocorrelation 240

Geary autocorrelation 240

CTD Composition 21

Transition 21

Distribution 105

Con joint Triad Con joint Triad 343

Quasi-sequence order Sequence order coupling number 60

Quasi-sequence order descriptors 100

Pseudo amino acid composition Pseudo amino acid composition 50

Amphiphilic pseudo amino acid

composition

The number depends on the choice of the number of properties of amino acid and the choice of the maximum values

of the

lag

. The default is use eight types of properties and

lag

= 30.

The number depends on the choice of the number of the set of amino acid properties and the choice of the

lamda

value. The default is use three types of properties proposed by Chou et al and

lamda

= 30.

The number depends on the choice of the

lamda

vlaue. The default is that

lamda

= 30.

Table

Table S2

S2 List of PyDPI computed descriptors for small molecules

Molecular

Molecular descriptors

descriptors

Constitutional

Constitutional descriptors

descriptors

Weight Molecular weight

2 nhyd Count of hydrogen atoms

3 nhal Count of halogen atoms

nhet Count of hetero atoms

nhev Count of heavy atoms

6 ncof Count of F atoms

7 ncocl Count of Cl atoms

8 ncobr Count of Br atoms

9 ncoi Count of I atoms

10 ncarb Count of C atoms

11 nphos Count of P atoms

12 nsulph Count of S atoms

13 noxy Count of O atoms

14 nnitro Count of N atoms

nring Number of rings

nrot Number of rotatable bonds

ndonr Number of H-bond donors

1 8

naccr Number of H-bond acceptors

19 nsb Number of single bonds

20 ndb Number of double bonds

21 ntb Number of triple bonds

22 naro Number of aromatic bonds

23 nta Number of all atoms

24 AWeight Average molecular weight

25-30 PC1

PC2

PC3

PC4

PC5

PC6

Molecular path counts of length 1-6

Topological

Topological descriptors

descriptors

1 W Weiner index

2 AW Average Wiener index

J Balaban

’

s J index

4 T

hara

Harary number

5 T

sch

Schiultz index

6 Tigdi Graph distance index

7 Platt Platt number

8 Xu Xu index

9 Pol Polarity number

10 Dz Pogliani index

Ipc

Ipc index

BertzCT

13 GMTI Gutman molecular topological index based on simple vertex degree

14-15 ZM1

ZM2

Zagreb index with order 1-2

16-17 MZM1

MZM2

Modified Zagreb index with order 1-2

18 Qindex Quadratic index

19 diametert Largest value in the distance matrix

20 radiust radius based on topology

21 petitjeant Petitjean based on topology

22 Sito the logarithm of the simple topological index by Narumi

23 Hato harmonic topological index proposed by Narumi

24 Geto Geometric topological index by Narumi

25 Arto Arithmetic topological index by Narumi

Connectivity

Connectivity descriptors

descriptors

1-11

a 0

Valence molecular connectivity Chi index for path order 0-10

Valence molecular connectivity Chi index for three cluster

Valence molecular connectivity Chi index for four cluster

Valence molecular connectivity Chi index for path/cluster

15-18

Valence molecular connectivity Chi index for cycles of 3-6

19-29

a 0

Simple molecular connectivity Chi indices for path order 0-10

Simple molecular connectivity Chi indices for three cluster

Simple molecular connectivity Chi indices for four cluster

Simple molecular connectivity Chi indices for path/cluster

33-36

Simple molecular connectivity Chi indices for cycles of 3-6

37 mChi1 mean chi1 (Randic) connectivity index

38 knotp the difference between chi3c and chi4pc

39 dchi0 the difference between chi0v and chi0

40 dchi1 the difference between chi1v and chi1

41 dchi2 the difference between chi2v and chi2

42 dchi3 the difference between chi3v and chi3

43 dchi4 the difference between chi4v and chi4

44 knotpv the difference between chiv3c and chiv4pc

Kappa

Kappa descriptors

descriptors

Kappa alpha index for 1 bonded fragment

Kappa alpha index for 2 bonded fragment

Kappa alpha index for 3 bonded fragment

4 phi Kier molecular flexibility index

a 1

κ Molecular shape Kappa index for 1 bonded fragment

a 2

κ Molecular shape Kappa index for 2 bonded fragment

a 3

κ Molecular shape Kappa index for 3 bonded fragment

Burden

Burden Descriptors

Descriptors

1-16 bcutm1-16 Burden descriptors based on atomic mass

17-32 bcutv1-16 Burden descriptors based on atomic vloumes

33-48 bcute1-16 Burden descriptors based on atomic electronegativity

49-64 bcutp1-16 Burden descriptors based on polarizability

Basak

Basak information

information

information descriptors

descriptors

1 IC0 Information content with order 0 proposed by Basak

2 IC1 Information content with order 1 proposed by Basak

3 IC2 Information content with order 2 proposed by Basak

4 IC3 Information content with order 3 proposed by Basak

5 IC4 Information content with order 4 proposed by Basak

6 IC5 Information content with order 5 proposed by Basak

7 IC6 Information content with order 6 proposed by Basak

8 SIC0 Complementary information content with order 0

proposed by Basak

9 SIC1 Structural information content with order 1 proposed by Basak

10 SIC2 Structural information content with order 2 proposed by Basak

11 SIC3 Structural information content with order 3 proposed by Basak

12 SIC4 Structural information content with order 4 proposed by Basak

13 SIC5 Structural information content with order 5 proposed by Basak

14 SIC6 Structural information content with order 6 proposed by Basak

15 CIC0 Complementary information content with order 0

proposed by Basak

16 CIC1 Complementary information content with order 1 proposed by Basak

17 CIC2 Complementary information content with order 2 proposed by Basak

18 CIC3 Complementary information content with order 3 proposed by Basak

19 CIC4 Complementary information content with order 4 proposed by Basak

20 CIC5 Complementary information content with order 5 proposed by Basak

21 CIC6 Complementary information content with order 6 proposed by Basak

E-state

E-state descriptors

descriptors

1 S(1) Sum of E-State of atom type: sLi

2 S(2) Sum of E-State of atom type: ssBe

3 S(3) Sum of E-State of atom type: ssssBe

4 S(4) Sum of E-State of atom type: ssBH

5 S(5) Sum of E-State of atom type: sssB

6 S(6) Sum of E-State of atom type: ssssB

7 S(7) Sum of E-State of atom type: sCH3

8 S(8) Sum of E-State of atom type: dCH2

9 S(9) Sum of E-State of atom type: ssCH2

10 S(10) Sum of E-State of atom type: tCH

11 S(11) Sum of E-State of atom type: dsCH

12 S(12) Sum of E-State of atom type: aaCH

13 S(13) Sum of E-State of atom type: sssCH

14 S(14) Sum of E-State of atom type: ddC

15 S(15) Sum of E-State of atom type: tsC

16 S(16) Sum of E-State of atom type: dssC

17 S(17) Sum of E-State of atom type: aasC

18 S(18) Sum of E-State of atom type: aaaC

19 S(19) Sum of E-State of atom type: ssssC

20 S(20) Sum of E-State of atom type: sNH3

21 S(21) Sum of E-State of atom type: sNH2

22 S(22) Sum of E-State of atom type: ssNH2

23 S(23) Sum of E-State of atom type: dNH

24 S(24) Sum of E-State of atom type: ssNH

25 S(25) Sum of E-State of atom type: aaNH

26 S(26) Sum of E-State of atom type: tN

27 S(27) Sum of E-State of atom type: sssNH

28 S(28) Sum of E-State of atom type: dsN

29 S(29) Sum of E-State of atom type: aaN

30 S(30) Sum of E-State of atom type: sssN

31 S(31) Sum of E-State of atom type: ddsN

32 S(32) Sum of E-State of atom type: aasN

33 S(33) Sum of E-State of atom type: ssssN

34 S(34) Sum of E-State of atom type: sOH

35 S(35) Sum of E-State of atom type: dO

36 S(36) Sum of E-State of atom type: ssO

37 S(37) Sum of E-State of atom type: aaO

38 S(38) Sum of E-State of atom type: sF

39 S(39) Sum of E-State of atom type: sSiH3

40 S(40) Sum of E-State of atom type: ssSiH2

41 S(41) Sum of E-State of atom type: sssSiH

42 S(42) Sum of E-State of atom type: ssssSi

43 S(43) Sum of E-State of atom type: sPH2

44 S(44) Sum of E-State of atom type: ssPH

45 S(45) Sum of E-State of atom type: sssP

46 S(46) Sum of E-State of atom type: dsssP

47 S(47) Sum of E-State of atom type: sssssP

48 S(48) Sum of E-State of atom type: sSH

49 S(49) Sum of E-State of atom type: dS

50 S(50) Sum of E-State of atom type: ssS

51 S(51) Sum of E-State of atom type: aaS

52 S(52) Sum of E-State of atom type: dssS

53 S(53) Sum of E-State of atom type: ddssS

54 S(54) Sum of E-State of atom type: sCl

55 S(55) Sum of E-State of atom type: sGeH3

56 S(56) Sum of E-State of atom type: ssGeH2

57 S(57) Sum of E-State of atom type: sssGeH

58 S(58) Sum of E-State of atom type: ssssGe

59 S(59) Sum of E-State of atom type: sAsH2

60 S(60) Sum of E-State of atom type: ssAsH

61 S(61) Sum of E-State of atom type: sssAs

62 S(62) Sum of E-State of atom type: sssdAs

63 S(63) Sum of E-State of atom type: sssssAs

64 S(64) Sum of E-State of atom type: sSeH

65 S(65) Sum of E-State of atom type: dSe

66 S(66) Sum of E-State of atom type: ssSe

67 S(67) Sum of E-State of atom type: aaSe

68 S(68) Sum of E-State of atom type: dssSe

69 S(69) Sum of E-State of atom type: ddssSe

70 S(70) Sum of E-State of atom type: sBr

71 S(71) Sum of E-State of atom type: sSnH3

72 S(72) Sum of E-State of atom type: ssSnH2

73 S(73) Sum of E-State of atom type: sssSnH

74 S(74) Sum of E-State of atom type: ssssSn

75 S(75) Sum of E-State of atom type: sI

76 S(76) Sum of E-State of atom type: sPbH3

77 S(77) Sum of E-State of atom type: ssPbH2

78 S(78) Sum of E-State of atom type: sssPbH

79 S(79) Sum of E-State of atom type: ssssPb

80-158 Smax1-Smax79 maxmum of E-State value of specified atom type

159-237 Smin1-Smin79 minimum of E-State value of specified atom type

Autocorrelation

Autocorrelation descriptors

descriptors

1-8 ATSm1-ATSm8 Moreau-Broto autocorrelation descriptors based on atom mass

9-16 ATSv1-ATSv8 Moreau-Broto autocorrelation descriptors based on atomic van

der Waals volume

17-24 ATSe1-ATSe8 Moreau-Broto autocorrelation descriptors based on atomic

Sanderson electronegativity

25-32 ATSp1-ATSp8 Moreau-Broto autocorrelation descriptors based on atomic

polarizability

33-40 MATSm1-MATSm8 Moran autocorrelation descriptors based on atom mass

41-48 MATSv1-MATSv8 Moran autocorrelation descriptors based on atomic van der Waals

volume

49-56 MATSe1-MATSe8 Moran autocorrelation descriptors based on atomic Sanderson

electronegativity

57-64 MATSp1-MATSp8 Moran autocorrelation descriptors based on atomic polarizability

65-72 GATSm1-GATSm8 Geary autocorrelation descriptors based on atom mass

73-80 GATSv1-GATSv8 Geary autocorrelation descriptors based on atomic van der Waals

volume

81-88 GATSe1-GATSe8 Geary autocorrelation descriptors based on atomic Sanderson

electronegativity

89-96 GATSp1-GATSp8 Geary autocorrelation descriptors based on atomic polarizability

Charge

Charge descriptors

descriptors

1-4 Q

Hmax

Cmax

Nmax

Omax

Most positive charge on H,C,N,O atoms

5-8 Q

Hmin

Cmin

Nmin

Omin

Most negative charge on H,C,N,O atoms

9-10 Q

max

min

Most positive and negative charge in a molecule

11-15 Q

HSS

CSS

Sum of squares of charges on H,C,N,O and all toms

NSS

OSS

Qass

16-17 Mpc

Tpc

Mean and total of positive charges

18-19 Mnc

Tnc

Mean and total of negative charges

20-21 Mac

Tac

Mean and total of absolute charges

22 Rpc Relative positive charge

23 Rnc Relative negative charge

24 SPP Submolecular polarity parameter

25 LDI Local dipole index

Molecular

Molecular property

property

property descriptors

descriptors

MREF Molar refractivity

logP LogP value based on the Crippen method

3 logP2 Square of LogP value based on the Crippen method

TPSA Topological polarity surface area

5 UI Unsaturation index

6 Hy Hydrophilic index

MOE-type

MOE-type descriptors

descriptors

M TPSA topological polar surface area based on fragments

LabuteASA Labute's Approximate Surface Area

3-14

SLOGPVSA MOE-type descriptors using SLogP contributions and surface area

contributions

15-24

SMRVSA MOE-type descriptors using MR contributions and surface area

contributions

25-38

PEOEVSA MOE-type descriptors using partial charges and surface area

contributions

39-49

EstateVSA MOE-type descriptors using Estate indices and surface area

contributions

50-60

VSAEstate MOE-type descriptors using surface area contributions and Estate

indices

Fragment/Fingerprint-based

Fragment/Fingerprint-based descriptors

descriptors

FP2 (Topological fingerprint) A Daylight-like fingerprint based on

hashing molecular subgraphs

MACCS (MACCS keys)Using the 166 public keys implemented as SMARTS

3 E-state 79 E-state fingerprints or fragments

4 FP4 307 FP4 fingerprints

Atom Paris Atom Paris fingerprints

Torsions Topological torsion fingerprints

Morgan/Circular Fingerprints based on the Morgan algorithm

Note:

indicates that these descriptors are from RDkit. In PyDPI, we wrapped most of molecular

descriptors form RDkit. The other descriptors are independently coded by us.

Molecular Descriptors Guide Manual

Navigation menu

Versions of this User Manual:

Views

Navigation