User Guide For PyDPI 1.0
User Manual:
Open the PDF directly: View PDF .
Page Count: 35
Download | |
Open PDF In Browser | View PDF |
User Guide for PyDPI 1.0 Dongsheng Cao ©2012 China Computational Biology Drug Design Group Table of Contents 1. What is this?........................................................................................................................................... 3 2. Install the PyDPI package...................................................................................................................... 3 3. Working on drug molecules................................................................................................................... 4 3.1. Read single molecules................................................................................................................. 4 3.2. Download molecules from corresponding ID.............................................................................. 5 3.3. Calculating molecular descriptors............................................................................................... 6 3.4. Molecular fingerprints and chemoinforamtics............................................................................. 9 3.4.1. Daylight-type fingerprints............................................................................................... 10 3.4.2. MACCS keys and FP4 fingerprints................................................................................. 10 3.4.3. E-state fingerprints.......................................................................................................... 11 3.4.4. Atom pairs and topological torsions................................................................................ 11 3.4.5. Morgan fingerprints......................................................................................................... 11 3.4.6. Using PyDrug object....................................................................................................... 11 3.4.7. Fingerprint similarity....................................................................................................... 12 4. Working on protein sequences...................................................................................................... 12 4.1. Download proteins from Uniprot.............................................................................................. 12 4.2. Download the property from the AAindex database................................................................. 14 4.3. Calculating protein descriptors.................................................................................................. 15 5. Interaction representation.................................................................................................................... 19 5.1 Protein-protein interaction descriptors....................................................................................... 19 5.2. Protein-ligand interaction descriptors........................................................................................ 19 Appendix................................................................................................................................................. 21 1. What is this? This document is intended to provide an overview of how one can use the PyDPI functionality from Python. It’s not comprehensive and it’s not a manual. If you find mistakes, or have suggestions for improvements, please either fix them yourselves in the source document (the .py file) or send them to the mailing list: oriental-cds@hotmail.com 2. Install the PyDPI package PyDPI has been successfully tested on Linux and Windows systems. The author could download this package from https://sourceforge.net/projects/pydpicao/ (.zip and .tar.gz). The install process of PyDPI is very easy: On Windows: (1): download the pydpi package (.zip) (2): extract or uncompress the .zip file (3): cd pydpi-1.0 (4): python setup.py install On Linux: (1): download the pydpi package (.tar.gz) (2): tar -zxf pydpi-1.0.tar.gz (3): cd pydpi-1.0 (4): python setup.py install or sudo python setup.py install Once the PyDPI package is installed, you can test if it is successfully installed. If the above functions are all correctly run, the PyDPI package is successfully installled. Note that you must guarantee that your computer is connected into the Internet. 3. Working on drug molecules 3.1. Read single molecules The majority of the basic drug molecular functionality is found in module pydrug: Individual molecules can be constructed using a variety of approaches. The PyDPI allow the users to provide different molecular formats. All of these functions return a Mol object on success: 3.2. Download molecules from corresponding ID The PyDPI allows the user to download the molecules by providing their IDs such as CAS, NCBI, KEGG, EBI and Drugbank. By providing a aspirin IDs, we could download its SMILES format conveniently. We can also download a molecule by constructing a PyDrug object, which contains the majority of the basic drug molecular functionality. You could read a molecule by providing a Drugbank ID: 3.3. Calculating molecular descriptors The PyDPI package could calculate a large number of molecular descriptors including constitutional descriptors, topological descriptors, connectivity indices, E-state indices, autocorrelation descriptors, charge descriptors, molecular properties, kappa shape indices, MOE-type descriptors, and molecular fingerprints. These descriptors capture and magnify distinct aspects of chemical structures. Once we read a Mol object, we could easily calculate these molecular descriptors: Example 1: Calculating molecular constitutional descriptors We could calculate any constitutional descriptor by calling the corresponding functions. We could also calculate all 30 descriptors by calling GetConstitutional function. The result is given in the form of dictionary. Example 2: Calculating topology descriptors 25 topology descriptors can be calculated by the PyDPI package. For detailed information of topology descriptors, refer to Table S2 in Appendix and their introductions in Manual. Example 3: Calculating molecular connectivity indices Example 4: Calculating molecular properties Example 5: Calculating Kappa shape descriptors Example 6: Calculating charge descriptors Example 7: Calculating descriptors using PyDrug object An easier way to calculate molecular descriptors is to generate a PyDrug object and then call their methods. The PyDrug contains the majority of drug molecule operation functionality. 3.4. Molecular fingerprints and chemoinforamtics In the PyDPI package, there are seven types of molecular fingerprints which are defined by abstracting and magnifying different aspects of molecular topology. 3.4.1. Daylight-type fingerprints We can calculate the similarity between two molecules by specifying a type of similarity measure. There exist to be nine types of similarity measures to calculate the similarity between two molecules. 3.4.2. MACCS keys and FP4 fingerprints Note that the input of MACCS and FP4 is different. 3.4.3. E-state fingerprints 3.4.4. Atom pairs and topological torsions 3.4.5. Morgan fingerprints 3.4.6. Using PyDrug object The convenient way to calculate the fingerprints is to generate a PyDrug object and call GetFingerprint method. 3.4.7. Fingerprint similarity We could any fingerprint similarity using the nine given similarity measure methods. 4. Working on protein sequences 4.1. Download proteins from Uniprot You can get a protein sequence from the Uniprot website by providing a Uniprot ID. You can get the window × 2+1 sub-sequences whose central point is the given amino acid ToAA. You can also get several protein sequences by providing a file containing Uniprot IDs of these proteins. The downloaded protein sequences have been saved in "/home/orient/res.txt". The user can also download the pdb file by providing corresponding pdb id, and then extract its amino acid sequence. The downloaded protein has been saved in “/home/orient/1atp.pdb”. You could check whether the input sequence is a valid protein sequence or not. The output is the number of the protein sequence if it is valid; otherwise 0. 4.2. Download the property from the AAindex database You could get the properties of amino acids from the AAindex database by providing a property name (e.g., KRIW790103). The output is given in the form of dictionary. If the user provides the directory containing the AAindex database (the AAindex database could be downloaded from ftp://ftp.genome.jp/pub/db/community/aaindex/. It consists of three files: aaindex1, aaindex2 and aaindex3), the program will read the given database to get the property. It should be noted that the PyDPI package has contained the AAindex database. The GetAAIndex1 methods in AAIndex will get the property from the aaindex1 database. If the user does not provide the directory containing the AAindex database, the program will downlaod the three databases (i.e., aaindex1, aaindex2 and aaindex3) to obtain the property. It should be noted that the downloaded AAindex will be saved in the current directory. You can also specify the directory according to your needs. The downloaded databases are saved in F disk. The GetAAIndex23 methods in AAIndex will get the property from the aaindex2 and aaindex3 databases. 4.3. Calculating protein descriptors There are two ways to calculate protein descriptors in the PyDPI package. One is to directly use the corresponding methods, the other one is firstly to construct a PyPro class and then run their methods to obtain the protein descriptors. It should be noted that the output is a dictionary form, whose keys and values represent the descriptor name and the descriptor value, respectively. The user could clearly understand the meaning of each descriptor. Use functions: We can also compute various types of descriptors based on PDB format. Use GetProDes class: Example 1: Calculating amino acid composition descriptors Example 2: Calculating Moran autocorrelation descriptors Example 3: Calculating pseudo amino acid composition descriptors When we change the values of lamda and weight, we could get different PAAC values. Note that the number of PAAC depends on the choice of lamda. If lamda = 10, we can obtain 20+lamda=30 PAAC descriptors. Example 4: Calculating all protein descriptors The PyPro class includes a built-in method which can calculate all protein descriptors. Example 5: Calculating protein descriptors based on the user-defined property The user could provide some property in the form of dictionary in python. Thus, PyDPI could calculate the descriptors based on the user-defined property. Example 6: Calculating protein descriptors based on the property from AAindex A powerful ability of PyDPI is that it can easily calculate thousands of protein features through automatically obtaining the needed property from AAindex. 5. Interaction representation 5.1 Protein-protein interaction descriptors 5.2. Protein-ligand interaction descriptors Appendix Appendix:: Table S1 List of propy computed features for protein sequences Feature group Features Number of descriptors Amino acid composition Amino acid composition 20 Dipeptide composition 400 Tripeptide composition 8000 Normalized Moreau-Broto 240a Autocorrelation autocorrelation Moran autocorrelation 240 a Geary autocorrelation 240 a Composition 21 Transition 21 Distribution 105 Conjoint triad Conjoint triad features 343 Quasi-sequence order Sequence order coupling number 60 Quasi-sequence order descriptors 100 Pseudo amino acid composition 50 b Amphiphilic pseudo amino acid 50c CTD Pseudo amino acid composition composition a The number depends on the choice of the number of properties of amino acid and the choice of the maximum values of the lag. The default is use eight types of properties and lag = 30. b The number depends on the choice of the number of the set of amino acid properties and the choice of the lamda value. The default is use three types of properties proposed by Chou et al and lamda = 30. c The number depends on the choice of the lamda vlaue. The default is that lamda = 30. Table S2 List of PyDPI computed descriptors for small molecules Molecular descriptors Constitutional descriptors 1a Weight 2 nhyd Count of hydrogen atoms 3 nhal Count of halogen atoms 4a nhet Count of hetero atoms 5a nhev Count of heavy atoms 6 ncof Count of F atoms 7 ncocl Count of Cl atoms 8 ncobr Count of Br atoms 9 ncoi Count of I atoms 10 ncarb Count of C atoms 11 nphos Count of P atoms 12 nsulph Count of S atoms 13 noxy Count of O atoms 14 nnitro Count of N atoms 15a nring Number of rings 16a nrot Number of rotatable bonds 17a ndonr Number of H-bond donors Molecular weight 18a naccr 19 nsb Number of single bonds 20 ndb Number of double bonds 21 ntb Number of triple bonds 22 naro Number of aromatic bonds 23 nta Number of all atoms 24 AWeight 25-30 PC1 Number of H-bond acceptors Average molecular weight Molecular path counts of length 1-6 PC2 PC3 PC4 PC5 PC6 Topological descriptors 1 W Weiner index 2 AW 3a J 4 Thara Harary number 5 Tsch Schiultz index 6 Tigdi Graph distance index 7 Platt Platt number 8 Xu Xu index Average Wiener index Balaban’s J index 9 Pol Polarity number 10 Dz Pogliani index 11a Ipc Ipc index 12a BertzCT BertzCT 13 GMTI Gutman molecular topological index based on simple vertex degree 14-15 ZM1 Zagreb index with order 1-2 ZM2 16-17 MZM1 Modified Zagreb index with order 1-2 MZM2 18 Qindex Quadratic index 19 diametert 20 radiust 21 petitjeant 22 Sito the logarithm of the simple topological index by Narumi 23 Hato harmonic topological index proposed by Narumi 24 Geto Geometric topological index by Narumi 25 Arto Arithmetic topological index by Narumi Largest value in the distance matrix radius based on topology Petitjean based on topology Connectivity descriptors 1-11 a 0 v χ 1 v χ 2 v χ 3 χpv Valence molecular connectivity Chi index for path order 0-10 4 χpv 5 χpv 6 χpv 7 χpv 8 χpv 9 χpv 10 χpv 12 3 v χc Valence molecular connectivity Chi index for three cluster 13 4 v χc Valence molecular connectivity Chi index for four cluster 14 4 v χ pc Valence molecular connectivity Chi index for path/cluster 15-18 3 v χ CH Valence molecular connectivity Chi index for cycles of 3-6 4 v χ CH 5 v χ CH 6 v χ CH 19-29a 0 χ Simple molecular connectivity Chi indices for path order 0-10 1 χ 2 χ 3 χp 4 χp 5 χp 6 χp 7 χp 8 χp 9 χp 10 χp 30 3 Simple molecular connectivity Chi indices for three cluster 31 4 χc Simple molecular connectivity Chi indices for four cluster 32 4 χpc Simple molecular connectivity Chi indices for path/cluster χc 33-36 3 χCH Simple molecular connectivity Chi indices for cycles of 3-6 4 χCH 5 χCH 6 χCH 37 mChi1 mean chi1 (Randic) connectivity index 38 knotp the difference between chi3c and chi4pc 39 dchi0 the difference between chi0v and chi0 40 dchi1 the difference between chi1v and chi1 41 dchi2 the difference between chi2v and chi2 42 dchi3 the difference between chi3v and chi3 43 dchi4 the difference between chi4v and chi4 44 knotpv the difference between chiv3c and chiv4pc Kappa descriptors 1 1 Kappa alpha index for 1 bonded fragment 2 2 Kappa alpha index for 2 bonded fragment 3 3 Kappa alpha index for 3 bonded fragment 4 phi Kier molecular flexibility index κα κα κα 5a 1 Molecular shape Kappa index for 1 bonded fragment 6a 2 Molecular shape Kappa index for 2 bonded fragment 7a 3 Molecular shape Kappa index for 3 bonded fragment κ κ κ Burden Descriptors 1-16 bcutm1-16 Burden descriptors based on atomic mass 17-32 bcutv1-16 Burden descriptors based on atomic vloumes 33-48 bcute1-16 Burden descriptors based on atomic electronegativity 49-64 bcutp1-16 Burden descriptors based on polarizability Basak information descriptors 1 IC0 Information content with order 0 proposed by Basak 2 IC1 Information content with order 1 proposed by Basak 3 IC2 Information content with order 2 proposed by Basak 4 IC3 Information content with order 3 proposed by Basak 5 IC4 Information content with order 4 proposed by Basak 6 IC5 Information content with order 5 proposed by Basak 7 IC6 Information content with order 6 proposed by Basak 8 SIC0 Complementary information content with order 0 proposed by Basak 9 SIC1 Structural information content with order 1 proposed by Basak 10 SIC2 Structural information content with order 2 proposed by Basak 11 SIC3 Structural information content with order 3 proposed by Basak 12 SIC4 Structural information content with order 4 proposed by Basak 13 SIC5 Structural information content with order 5 proposed by Basak 14 SIC6 Structural information content with order 6 proposed by Basak 15 CIC0 Complementary information content with order 0 proposed by Basak 16 CIC1 Complementary information content with order 1 proposed by Basak 17 CIC2 Complementary information content with order 2 proposed by Basak 18 CIC3 Complementary information content with order 3 proposed by Basak 19 CIC4 Complementary information content with order 4 proposed by Basak 20 CIC5 Complementary information content with order 5 proposed by Basak 21 CIC6 Complementary information content with order 6 proposed by Basak E-state descriptors 1 S(1) Sum of E-State of atom type: sLi 2 S(2) Sum of E-State of atom type: ssBe 3 S(3) Sum of E-State of atom type: ssssBe 4 S(4) Sum of E-State of atom type: ssBH 5 S(5) Sum of E-State of atom type: sssB 6 S(6) Sum of E-State of atom type: ssssB 7 S(7) Sum of E-State of atom type: sCH3 8 S(8) Sum of E-State of atom type: dCH2 9 S(9) Sum of E-State of atom type: ssCH2 10 S(10) Sum of E-State of atom type: tCH 11 S(11) Sum of E-State of atom type: dsCH 12 S(12) Sum of E-State of atom type: aaCH 13 S(13) Sum of E-State of atom type: sssCH 14 S(14) Sum of E-State of atom type: ddC 15 S(15) Sum of E-State of atom type: tsC 16 S(16) Sum of E-State of atom type: dssC 17 S(17) Sum of E-State of atom type: aasC 18 S(18) Sum of E-State of atom type: aaaC 19 S(19) Sum of E-State of atom type: ssssC 20 S(20) Sum of E-State of atom type: sNH3 21 S(21) Sum of E-State of atom type: sNH2 22 S(22) Sum of E-State of atom type: ssNH2 23 S(23) Sum of E-State of atom type: dNH 24 S(24) Sum of E-State of atom type: ssNH 25 S(25) Sum of E-State of atom type: aaNH 26 S(26) Sum of E-State of atom type: tN 27 S(27) Sum of E-State of atom type: sssNH 28 S(28) Sum of E-State of atom type: dsN 29 S(29) Sum of E-State of atom type: aaN 30 S(30) Sum of E-State of atom type: sssN 31 S(31) Sum of E-State of atom type: ddsN 32 S(32) Sum of E-State of atom type: aasN 33 S(33) Sum of E-State of atom type: ssssN 34 S(34) Sum of E-State of atom type: sOH 35 S(35) Sum of E-State of atom type: dO 36 S(36) Sum of E-State of atom type: ssO 37 S(37) Sum of E-State of atom type: aaO 38 S(38) Sum of E-State of atom type: sF 39 S(39) Sum of E-State of atom type: sSiH3 40 S(40) Sum of E-State of atom type: ssSiH2 41 S(41) Sum of E-State of atom type: sssSiH 42 S(42) Sum of E-State of atom type: ssssSi 43 S(43) Sum of E-State of atom type: sPH2 44 S(44) Sum of E-State of atom type: ssPH 45 S(45) Sum of E-State of atom type: sssP 46 S(46) Sum of E-State of atom type: dsssP 47 S(47) Sum of E-State of atom type: sssssP 48 S(48) Sum of E-State of atom type: sSH 49 S(49) Sum of E-State of atom type: dS 50 S(50) Sum of E-State of atom type: ssS 51 S(51) Sum of E-State of atom type: aaS 52 S(52) Sum of E-State of atom type: dssS 53 S(53) Sum of E-State of atom type: ddssS 54 S(54) Sum of E-State of atom type: sCl 55 S(55) Sum of E-State of atom type: sGeH3 56 S(56) Sum of E-State of atom type: ssGeH2 57 S(57) Sum of E-State of atom type: sssGeH 58 S(58) Sum of E-State of atom type: ssssGe 59 S(59) Sum of E-State of atom type: sAsH2 60 S(60) Sum of E-State of atom type: ssAsH 61 S(61) Sum of E-State of atom type: sssAs 62 S(62) Sum of E-State of atom type: sssdAs 63 S(63) Sum of E-State of atom type: sssssAs 64 S(64) Sum of E-State of atom type: sSeH 65 S(65) Sum of E-State of atom type: dSe 66 S(66) Sum of E-State of atom type: ssSe 67 S(67) Sum of E-State of atom type: aaSe 68 S(68) Sum of E-State of atom type: dssSe 69 S(69) Sum of E-State of atom type: ddssSe 70 S(70) Sum of E-State of atom type: sBr 71 S(71) Sum of E-State of atom type: sSnH3 72 S(72) Sum of E-State of atom type: ssSnH2 73 S(73) Sum of E-State of atom type: sssSnH 74 S(74) Sum of E-State of atom type: ssssSn 75 S(75) Sum of E-State of atom type: sI 76 S(76) Sum of E-State of atom type: sPbH3 77 S(77) Sum of E-State of atom type: ssPbH2 78 S(78) Sum of E-State of atom type: sssPbH 79 S(79) Sum of E-State of atom type: ssssPb 80-158 Smax1-Smax79 maxmum of E-State value of specified atom type 159-237 Smin1-Smin79 minimum of E-State value of specified atom type Autocorrelation descriptors 1-8 ATSm1-ATSm8 Moreau-Broto autocorrelation descriptors based on atom mass 9-16 ATSv1-ATSv8 Moreau-Broto autocorrelation descriptors based on atomic van der Waals volume 17-24 ATSe1-ATSe8 Moreau-Broto autocorrelation descriptors based on atomic Sanderson electronegativity 25-32 ATSp1-ATSp8 Moreau-Broto autocorrelation descriptors based on atomic polarizability 33-40 MATSm1-MATSm8 Moran autocorrelation descriptors based on atom mass 41-48 MATSv1-MATSv8 Moran autocorrelation descriptors based on atomic van der Waals volume 49-56 MATSe1-MATSe8 Moran autocorrelation descriptors based on atomic Sanderson electronegativity 57-64 MATSp1-MATSp8 Moran autocorrelation descriptors based on atomic polarizability 65-72 GATSm1-GATSm8 Geary autocorrelation descriptors based on atom mass 73-80 GATSv1-GATSv8 Geary autocorrelation descriptors based on atomic van der Waals volume 81-88 GATSe1-GATSe8 Geary autocorrelation descriptors based on atomic Sanderson electronegativity 89-96 GATSp1-GATSp8 Geary autocorrelation descriptors based on atomic polarizability Charge descriptors 1-4 QHmax Most positive charge on H,C,N,O atoms QCmax QNmax QOmax 5-8 QHmin Most negative charge on H,C,N,O atoms QCmin QNmin QOmin 9-10 Qmax Most positive and negative charge in a molecule Qmin 11-15 QHSS Sum of squares of charges on H,C,N,O and all toms QCSS QNSS QOSS Qass 16-17 Mpc Mean and total of positive charges Tpc 18-19 Mnc Mean and total of negative charges Tnc 20-21 Mac Mean and total of absolute charges Tac 22 Rpc Relative positive charge 23 Rnc Relative negative charge 24 SPP Submolecular polarity parameter 25 LDI Local dipole index Molecular property descriptors 1a MREF 2a logP LogP value based on the Crippen method 3 logP2 Square of LogP value based on the Crippen method 4a TPSA Topological polarity surface area 5 UI Unsaturation index 6 Hy Hydrophilic index Molar refractivity MOE-type descriptors 1a MTPSA topological polar surface area based on fragments 2a LabuteASA Labute's Approximate Surface Area 3-14a SLOGPVSA MOE-type descriptors using SLogP contributions and surface area contributions 15-24a SMRVSA MOE-type descriptors using MR contributions and surface area contributions 25-38a PEOEVSA MOE-type descriptors using partial charges and surface area contributions 39-49a EstateVSA MOE-type descriptors using Estate indices and surface area contributions 50-60a VSAEstate MOE-type descriptors using surface area contributions and Estate indices Fragment/Fingerprint-based descriptors 1a FP2 (Topological fingerprint) A Daylight-like fingerprint based on hashing molecular subgraphs 2a MACCS 3 E-state 4 FP4 5a Atom Paris 6a Torsions 7a Morgan/Circular Note: a (MACCS keys)Using the 166 public keys implemented as SMARTS 79 E-state fingerprints or fragments 307 FP4 fingerprints Atom Paris fingerprints Topological torsion fingerprints Fingerprints based on the Morgan algorithm indicates that these descriptors are from RDkit. In PyDPI, we wrapped most of molecular descriptors form RDkit. The other descriptors are independently coded by us.
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 35 Page Mode : UseOutlines Creator : WPS Office 个人版 Author : orient Title : User Guide for PyDPI 1.0 Create Date : 2013:07:16 13:47:07+08:00 Producer : PDFlib 7.0.3 (C++/Win32)EXIF Metadata provided by EXIF.tools