User Guide For PyDPI 1.0

User Manual:

Open the PDF directly: View PDF PDF.
Page Count: 35

DownloadUser Guide For PyDPI 1.0
Open PDF In BrowserView PDF
User Guide for PyDPI 1.0

Dongsheng Cao

©2012 China Computational Biology Drug Design Group

Table of Contents
1. What is this?........................................................................................................................................... 3
2. Install the PyDPI package...................................................................................................................... 3
3. Working on drug molecules................................................................................................................... 4
3.1. Read single molecules................................................................................................................. 4
3.2. Download molecules from corresponding ID.............................................................................. 5
3.3. Calculating molecular descriptors............................................................................................... 6
3.4. Molecular fingerprints and chemoinforamtics............................................................................. 9
3.4.1. Daylight-type fingerprints............................................................................................... 10
3.4.2. MACCS keys and FP4 fingerprints................................................................................. 10
3.4.3. E-state fingerprints.......................................................................................................... 11
3.4.4. Atom pairs and topological torsions................................................................................ 11
3.4.5. Morgan fingerprints......................................................................................................... 11
3.4.6. Using PyDrug object....................................................................................................... 11
3.4.7. Fingerprint similarity....................................................................................................... 12
4. Working on protein sequences...................................................................................................... 12
4.1. Download proteins from Uniprot.............................................................................................. 12
4.2. Download the property from the AAindex database................................................................. 14
4.3. Calculating protein descriptors.................................................................................................. 15
5. Interaction representation.................................................................................................................... 19
5.1 Protein-protein interaction descriptors....................................................................................... 19
5.2. Protein-ligand interaction descriptors........................................................................................ 19
Appendix................................................................................................................................................. 21

1. What is this?
This document is intended to provide an overview of how one can use the PyDPI functionality from
Python. It’s not comprehensive and it’s not a manual.
If you find mistakes, or have suggestions for improvements, please either fix them yourselves in the
source document (the .py file) or send them to the mailing list: oriental-cds@hotmail.com

2. Install the PyDPI package
PyDPI has been successfully tested on Linux and Windows systems. The author could download this
package from https://sourceforge.net/projects/pydpicao/ (.zip and .tar.gz). The install process of PyDPI
is very easy:

On Windows:
(1): download the pydpi package (.zip)
(2): extract or uncompress the .zip file
(3): cd pydpi-1.0
(4): python setup.py install
On Linux:
(1): download the pydpi package (.tar.gz)
(2): tar -zxf pydpi-1.0.tar.gz
(3): cd pydpi-1.0
(4): python setup.py install or sudo python setup.py install

Once the PyDPI package is installed, you can test if it is successfully installed.

If the above functions are all correctly run, the PyDPI package is successfully installled.
Note that you must guarantee that your computer is connected into the Internet.

3. Working on drug molecules

3.1. Read single molecules
The majority of the basic drug molecular functionality is found in module pydrug:

Individual molecules can be constructed using a variety of approaches.

The PyDPI allow the users to provide different molecular formats.

All of these functions return a Mol object on success:

3.2. Download molecules from corresponding ID
The PyDPI allows the user to download the molecules by providing their IDs such as CAS, NCBI,
KEGG, EBI and Drugbank.

By providing a aspirin IDs, we could download its SMILES format conveniently.
We can also download a molecule by constructing a PyDrug object, which contains the majority of the
basic drug molecular functionality.

You could read a molecule by providing a Drugbank ID:

3.3. Calculating molecular descriptors
The PyDPI package could calculate a large number of molecular descriptors including constitutional
descriptors, topological descriptors, connectivity indices, E-state indices, autocorrelation descriptors,
charge descriptors, molecular properties, kappa shape indices, MOE-type descriptors, and molecular
fingerprints. These descriptors capture and magnify distinct aspects of chemical structures.
Once we read a Mol object, we could easily calculate these molecular descriptors:

Example 1: Calculating molecular constitutional descriptors

We could calculate any constitutional descriptor by calling the corresponding functions. We could also
calculate all 30 descriptors by calling GetConstitutional function. The result is given in the form of
dictionary.

Example 2: Calculating topology descriptors

25 topology descriptors can be calculated by the PyDPI package. For detailed information of topology
descriptors, refer to Table S2 in Appendix and their introductions in Manual.

Example 3: Calculating molecular connectivity indices

Example 4: Calculating molecular properties

Example 5: Calculating Kappa shape descriptors

Example 6: Calculating charge descriptors

Example 7: Calculating descriptors using PyDrug object

An easier way to calculate molecular descriptors is to generate a PyDrug object and then call their
methods. The PyDrug contains the majority of drug molecule operation functionality.

3.4. Molecular fingerprints and chemoinforamtics
In the PyDPI package, there are seven types of molecular fingerprints which are defined by abstracting
and magnifying different aspects of molecular topology.

3.4.1. Daylight-type fingerprints

We can calculate the similarity between two molecules by specifying a type of similarity measure.
There exist to be nine types of similarity measures to calculate the similarity between two molecules.

3.4.2. MACCS keys and FP4 fingerprints

Note that the input of MACCS and FP4 is different.

3.4.3. E-state fingerprints

3.4.4. Atom pairs and topological torsions

3.4.5. Morgan fingerprints

3.4.6. Using PyDrug object
The convenient way to calculate the fingerprints is to generate a PyDrug object and call GetFingerprint
method.

3.4.7. Fingerprint similarity
We could any fingerprint similarity using the nine given similarity measure methods.

4. Working on protein sequences

4.1. Download proteins from Uniprot

You can get a protein sequence from the Uniprot website by providing a Uniprot ID.
You can get the window × 2+1 sub-sequences whose central point is the given amino acid ToAA.

You can also get several protein sequences by providing a file containing Uniprot IDs of these proteins.

The downloaded protein sequences have been saved in "/home/orient/res.txt".

The user can also download the pdb file by providing corresponding pdb id, and then extract its amino
acid sequence.

The downloaded protein has been saved in “/home/orient/1atp.pdb”.

You could check whether the input sequence is a valid protein sequence or not.

The output is the number of the protein sequence if it is valid; otherwise 0.
4.2. Download the property from the AAindex database
You could get the properties of amino acids from the AAindex database by providing a property name
(e.g., KRIW790103). The output is given in the form of dictionary.

If the user provides the directory containing the AAindex database (the AAindex database could be
downloaded from ftp://ftp.genome.jp/pub/db/community/aaindex/. It consists of three files: aaindex1,
aaindex2 and aaindex3), the program will read the given database to get the property.

It should be noted that the PyDPI package has contained the AAindex database. The GetAAIndex1
methods in AAIndex will get the property from the aaindex1 database.

If the user does not provide the directory containing the AAindex database, the program will downlaod
the three databases (i.e., aaindex1, aaindex2 and aaindex3) to obtain the property. It should be noted
that the downloaded AAindex will be saved in the current directory. You can also specify the directory
according to your needs.

The downloaded databases are saved in F disk. The GetAAIndex23 methods in AAIndex will get the
property from the aaindex2 and aaindex3 databases.
4.3. Calculating protein descriptors
There are two ways to calculate protein descriptors in the PyDPI package. One is to directly use the
corresponding methods, the other one is firstly to construct a PyPro class and then run their methods to
obtain the protein descriptors. It should be noted that the output is a dictionary form, whose keys and
values represent the descriptor name and the descriptor value, respectively. The user could clearly
understand the meaning of each descriptor.

Use functions:

We can also compute various types of descriptors based on PDB format.

Use GetProDes class:

Example 1: Calculating amino acid composition descriptors

Example 2: Calculating Moran autocorrelation descriptors

Example 3: Calculating pseudo amino acid composition descriptors

When we change the values of lamda and weight, we could get different PAAC values. Note that the
number of PAAC depends on the choice of lamda. If lamda = 10, we can obtain 20+lamda=30 PAAC
descriptors.

Example 4: Calculating all protein descriptors
The PyPro class includes a built-in method which can calculate all protein descriptors.

Example 5: Calculating protein descriptors based on the user-defined property

The user could provide some property in the form of dictionary in python. Thus, PyDPI could calculate
the descriptors based on the user-defined property.

Example 6: Calculating protein descriptors based on the property from AAindex
A powerful ability of PyDPI is that it can easily calculate thousands of protein features through
automatically obtaining the needed property from AAindex.

5. Interaction representation

5.1 Protein-protein interaction descriptors

5.2. Protein-ligand interaction descriptors

Appendix
Appendix::

Table S1 List of propy computed features for protein sequences

Feature group

Features

Number of descriptors

Amino acid composition

Amino acid composition

20

Dipeptide composition

400

Tripeptide composition

8000

Normalized Moreau-Broto

240a

Autocorrelation

autocorrelation
Moran autocorrelation

240 a

Geary autocorrelation

240 a

Composition

21

Transition

21

Distribution

105

Conjoint triad

Conjoint triad features

343

Quasi-sequence order

Sequence order coupling number

60

Quasi-sequence order descriptors

100

Pseudo amino acid composition

50 b

Amphiphilic pseudo amino acid

50c

CTD

Pseudo amino acid composition

composition
a

The number depends on the choice of the number of properties of amino acid and the choice of the maximum values

of the lag. The default is use eight types of properties and lag = 30.
b

The number depends on the choice of the number of the set of amino acid properties and the choice of the lamda

value. The default is use three types of properties proposed by Chou et al and lamda = 30.
c

The number depends on the choice of the lamda vlaue. The default is that lamda = 30.

Table S2 List of PyDPI computed descriptors for small molecules

Molecular descriptors
Constitutional descriptors
1a

Weight

2

nhyd

Count of hydrogen atoms

3

nhal

Count of halogen atoms

4a

nhet

Count of hetero atoms

5a

nhev

Count of heavy atoms

6

ncof

Count of F atoms

7

ncocl

Count of Cl atoms

8

ncobr

Count of Br atoms

9

ncoi

Count of I atoms

10

ncarb

Count of C atoms

11

nphos

Count of P atoms

12

nsulph

Count of S atoms

13

noxy

Count of O atoms

14

nnitro

Count of N atoms

15a

nring

Number of rings

16a

nrot

Number of rotatable bonds

17a

ndonr

Number of H-bond donors

Molecular weight

18a

naccr

19

nsb

Number of single bonds

20

ndb

Number of double bonds

21

ntb

Number of triple bonds

22

naro

Number of aromatic bonds

23

nta

Number of all atoms

24

AWeight

25-30

PC1

Number of H-bond acceptors

Average molecular weight
Molecular path counts of length 1-6

PC2
PC3
PC4
PC5
PC6
Topological descriptors
1

W

Weiner index

2

AW

3a

J

4

Thara

Harary number

5

Tsch

Schiultz index

6

Tigdi

Graph distance index

7

Platt

Platt number

8

Xu

Xu index

Average Wiener index
Balaban’s J index

9

Pol

Polarity number

10

Dz

Pogliani index

11a

Ipc

Ipc index

12a

BertzCT

BertzCT

13

GMTI

Gutman molecular topological index based on simple vertex degree

14-15

ZM1

Zagreb index with order 1-2

ZM2
16-17

MZM1

Modified Zagreb index with order 1-2

MZM2
18

Qindex

Quadratic index

19

diametert

20

radiust

21

petitjeant

22

Sito

the logarithm of the simple topological index by Narumi

23

Hato

harmonic topological index proposed by Narumi

24

Geto

Geometric topological index by Narumi

25

Arto

Arithmetic topological index by Narumi

Largest value in the distance matrix
radius based on topology
Petitjean based on topology

Connectivity descriptors
1-11 a

0 v

χ

1 v

χ

2 v

χ

3

χpv

Valence molecular connectivity Chi index for path order 0-10

4

χpv

5

χpv

6

χpv

7

χpv

8

χpv

9

χpv

10

χpv

12

3 v
χc

Valence molecular connectivity Chi index for three cluster

13

4 v
χc

Valence molecular connectivity Chi index for four cluster

14

4 v
χ pc

Valence molecular connectivity Chi index for path/cluster

15-18

3 v
χ CH

Valence molecular connectivity Chi index for cycles of 3-6

4 v
χ CH
5 v
χ CH
6 v
χ CH

19-29a

0

χ

Simple molecular connectivity Chi indices for path order 0-10

1

χ

2

χ

3

χp

4

χp

5

χp

6

χp

7

χp

8

χp

9

χp

10

χp

30

3

Simple molecular connectivity Chi indices for three cluster

31

4

χc

Simple molecular connectivity Chi indices for four cluster

32

4

χpc

Simple molecular connectivity Chi indices for path/cluster

χc

33-36

3

χCH

Simple molecular connectivity Chi indices for cycles of 3-6

4

χCH

5

χCH

6

χCH

37

mChi1

mean chi1 (Randic) connectivity index

38

knotp

the difference between chi3c and chi4pc

39

dchi0

the difference between chi0v and chi0

40

dchi1

the difference between chi1v and chi1

41

dchi2

the difference between chi2v and chi2

42

dchi3

the difference between chi3v and chi3

43

dchi4

the difference between chi4v and chi4

44

knotpv

the difference between chiv3c and chiv4pc
Kappa descriptors

1

1

Kappa alpha index for 1 bonded fragment

2

2

Kappa alpha index for 2 bonded fragment

3

3

Kappa alpha index for 3 bonded fragment

4

phi

Kier molecular flexibility index

κα
κα
κα

5a

1

Molecular shape Kappa index for 1 bonded fragment

6a

2

Molecular shape Kappa index for 2 bonded fragment

7a

3

Molecular shape Kappa index for 3 bonded fragment

κ
κ
κ

Burden Descriptors

1-16

bcutm1-16

Burden descriptors based on atomic mass

17-32

bcutv1-16

Burden descriptors based on atomic vloumes

33-48

bcute1-16

Burden descriptors based on atomic electronegativity

49-64

bcutp1-16

Burden descriptors based on polarizability
Basak information descriptors

1

IC0

Information content with order 0 proposed by Basak

2

IC1

Information content with order 1 proposed by Basak

3

IC2

Information content with order 2 proposed by Basak

4

IC3

Information content with order 3 proposed by Basak

5

IC4

Information content with order 4 proposed by Basak

6

IC5

Information content with order 5 proposed by Basak

7

IC6

Information content with order 6 proposed by Basak

8

SIC0

Complementary information content with order 0
proposed by Basak

9

SIC1

Structural information content with order 1 proposed by Basak

10

SIC2

Structural information content with order 2 proposed by Basak

11

SIC3

Structural information content with order 3 proposed by Basak

12

SIC4

Structural information content with order 4 proposed by Basak

13

SIC5

Structural information content with order 5 proposed by Basak

14

SIC6

Structural information content with order 6 proposed by Basak

15

CIC0

Complementary information content with order 0

proposed by Basak
16

CIC1

Complementary information content with order 1 proposed by Basak

17

CIC2

Complementary information content with order 2 proposed by Basak

18

CIC3

Complementary information content with order 3 proposed by Basak

19

CIC4

Complementary information content with order 4 proposed by Basak

20

CIC5

Complementary information content with order 5 proposed by Basak

21

CIC6

Complementary information content with order 6 proposed by Basak
E-state descriptors

1

S(1)

Sum of E-State of atom type: sLi

2

S(2)

Sum of E-State of atom type: ssBe

3

S(3)

Sum of E-State of atom type: ssssBe

4

S(4)

Sum of E-State of atom type: ssBH

5

S(5)

Sum of E-State of atom type: sssB

6

S(6)

Sum of E-State of atom type: ssssB

7

S(7)

Sum of E-State of atom type: sCH3

8

S(8)

Sum of E-State of atom type: dCH2

9

S(9)

Sum of E-State of atom type: ssCH2

10

S(10)

Sum of E-State of atom type: tCH

11

S(11)

Sum of E-State of atom type: dsCH

12

S(12)

Sum of E-State of atom type: aaCH

13

S(13)

Sum of E-State of atom type: sssCH

14

S(14)

Sum of E-State of atom type: ddC

15

S(15)

Sum of E-State of atom type: tsC

16

S(16)

Sum of E-State of atom type: dssC

17

S(17)

Sum of E-State of atom type: aasC

18

S(18)

Sum of E-State of atom type: aaaC

19

S(19)

Sum of E-State of atom type: ssssC

20

S(20)

Sum of E-State of atom type: sNH3

21

S(21)

Sum of E-State of atom type: sNH2

22

S(22)

Sum of E-State of atom type: ssNH2

23

S(23)

Sum of E-State of atom type: dNH

24

S(24)

Sum of E-State of atom type: ssNH

25

S(25)

Sum of E-State of atom type: aaNH

26

S(26)

Sum of E-State of atom type: tN

27

S(27)

Sum of E-State of atom type: sssNH

28

S(28)

Sum of E-State of atom type: dsN

29

S(29)

Sum of E-State of atom type: aaN

30

S(30)

Sum of E-State of atom type: sssN

31

S(31)

Sum of E-State of atom type: ddsN

32

S(32)

Sum of E-State of atom type: aasN

33

S(33)

Sum of E-State of atom type: ssssN

34

S(34)

Sum of E-State of atom type: sOH

35

S(35)

Sum of E-State of atom type: dO

36

S(36)

Sum of E-State of atom type: ssO

37

S(37)

Sum of E-State of atom type: aaO

38

S(38)

Sum of E-State of atom type: sF

39

S(39)

Sum of E-State of atom type: sSiH3

40

S(40)

Sum of E-State of atom type: ssSiH2

41

S(41)

Sum of E-State of atom type: sssSiH

42

S(42)

Sum of E-State of atom type: ssssSi

43

S(43)

Sum of E-State of atom type: sPH2

44

S(44)

Sum of E-State of atom type: ssPH

45

S(45)

Sum of E-State of atom type: sssP

46

S(46)

Sum of E-State of atom type: dsssP

47

S(47)

Sum of E-State of atom type: sssssP

48

S(48)

Sum of E-State of atom type: sSH

49

S(49)

Sum of E-State of atom type: dS

50

S(50)

Sum of E-State of atom type: ssS

51

S(51)

Sum of E-State of atom type: aaS

52

S(52)

Sum of E-State of atom type: dssS

53

S(53)

Sum of E-State of atom type: ddssS

54

S(54)

Sum of E-State of atom type: sCl

55

S(55)

Sum of E-State of atom type: sGeH3

56

S(56)

Sum of E-State of atom type: ssGeH2

57

S(57)

Sum of E-State of atom type: sssGeH

58

S(58)

Sum of E-State of atom type: ssssGe

59

S(59)

Sum of E-State of atom type: sAsH2

60

S(60)

Sum of E-State of atom type: ssAsH

61

S(61)

Sum of E-State of atom type: sssAs

62

S(62)

Sum of E-State of atom type: sssdAs

63

S(63)

Sum of E-State of atom type: sssssAs

64

S(64)

Sum of E-State of atom type: sSeH

65

S(65)

Sum of E-State of atom type: dSe

66

S(66)

Sum of E-State of atom type: ssSe

67

S(67)

Sum of E-State of atom type: aaSe

68

S(68)

Sum of E-State of atom type: dssSe

69

S(69)

Sum of E-State of atom type: ddssSe

70

S(70)

Sum of E-State of atom type: sBr

71

S(71)

Sum of E-State of atom type: sSnH3

72

S(72)

Sum of E-State of atom type: ssSnH2

73

S(73)

Sum of E-State of atom type: sssSnH

74

S(74)

Sum of E-State of atom type: ssssSn

75

S(75)

Sum of E-State of atom type: sI

76

S(76)

Sum of E-State of atom type: sPbH3

77

S(77)

Sum of E-State of atom type: ssPbH2

78

S(78)

Sum of E-State of atom type: sssPbH

79

S(79)

Sum of E-State of atom type: ssssPb

80-158

Smax1-Smax79

maxmum of E-State value of specified atom type

159-237

Smin1-Smin79

minimum of E-State value of specified atom type
Autocorrelation descriptors

1-8

ATSm1-ATSm8

Moreau-Broto autocorrelation descriptors based on atom mass

9-16

ATSv1-ATSv8

Moreau-Broto autocorrelation descriptors based on atomic van der
Waals volume

17-24

ATSe1-ATSe8

Moreau-Broto autocorrelation descriptors based on atomic
Sanderson electronegativity

25-32

ATSp1-ATSp8

Moreau-Broto autocorrelation descriptors based on atomic
polarizability

33-40

MATSm1-MATSm8

Moran autocorrelation descriptors based on atom mass

41-48

MATSv1-MATSv8

Moran autocorrelation descriptors based on atomic van der Waals
volume

49-56

MATSe1-MATSe8

Moran autocorrelation descriptors based on atomic Sanderson
electronegativity

57-64

MATSp1-MATSp8

Moran autocorrelation descriptors based on atomic polarizability

65-72

GATSm1-GATSm8

Geary autocorrelation descriptors based on atom mass

73-80

GATSv1-GATSv8

Geary autocorrelation descriptors based on atomic van der Waals
volume

81-88

GATSe1-GATSe8

Geary autocorrelation descriptors based on atomic Sanderson
electronegativity

89-96

GATSp1-GATSp8

Geary autocorrelation descriptors based on atomic polarizability
Charge descriptors

1-4

QHmax

Most positive charge on H,C,N,O atoms

QCmax
QNmax
QOmax
5-8

QHmin

Most negative charge on H,C,N,O atoms

QCmin
QNmin
QOmin
9-10

Qmax

Most positive and negative charge in a molecule

Qmin
11-15

QHSS

Sum of squares of charges on H,C,N,O and all toms

QCSS
QNSS
QOSS
Qass
16-17

Mpc

Mean and total of positive charges

Tpc
18-19

Mnc

Mean and total of negative charges

Tnc
20-21

Mac

Mean and total of absolute charges

Tac
22

Rpc

Relative positive charge

23

Rnc

Relative negative charge

24

SPP

Submolecular polarity parameter

25

LDI

Local dipole index
Molecular property descriptors

1a

MREF

2a

logP

LogP value based on the Crippen method

3

logP2

Square of LogP value based on the Crippen method

4a

TPSA

Topological polarity surface area

5

UI

Unsaturation index

6

Hy

Hydrophilic index

Molar refractivity

MOE-type descriptors
1a

MTPSA

topological polar surface area based on fragments

2a

LabuteASA

Labute's Approximate Surface Area

3-14a

SLOGPVSA

MOE-type descriptors using SLogP contributions and surface area
contributions

15-24a

SMRVSA

MOE-type descriptors using MR contributions and surface area
contributions

25-38a

PEOEVSA

MOE-type descriptors using partial charges and surface area
contributions

39-49a

EstateVSA

MOE-type descriptors using Estate indices and surface area
contributions

50-60a

VSAEstate

MOE-type descriptors using surface area contributions and Estate
indices

Fragment/Fingerprint-based descriptors
1a

FP2

(Topological fingerprint) A Daylight-like fingerprint based on hashing
molecular subgraphs

2a

MACCS

3

E-state

4

FP4

5a

Atom Paris

6a

Torsions

7a

Morgan/Circular

Note:

a

(MACCS keys)Using the 166 public keys implemented as SMARTS
79 E-state fingerprints or fragments
307 FP4 fingerprints
Atom Paris fingerprints
Topological torsion fingerprints
Fingerprints based on the Morgan algorithm

indicates that these descriptors are from RDkit. In PyDPI, we wrapped most of molecular descriptors form

RDkit. The other descriptors are independently coded by us.



Source Exif Data:
File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.5
Linearized                      : No
Page Count                      : 35
Page Mode                       : UseOutlines
Creator                         : WPS Office 个人版
Author                          : orient
Title                           : User Guide for PyDPI 1.0
Create Date                     : 2013:07:16 13:47:07+08:00
Producer                        : PDFlib 7.0.3 (C++/Win32)
EXIF Metadata provided by EXIF.tools

Navigation menu