GROMACS Reference Manual 2018.3
User Manual:
Open the PDF directly: View PDF .
Page Count: 270
Download | ![]() |
Open PDF In Browser | View PDF |
GROMACS Groningen Machine for Chemical Simulations Reference Manual Version 2018.3 GROMACS Reference Manual Version 2018.3 Contributions from Emile Apol, Rossen Apostolov, Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, Sebastian Fritsch, Gerrit Groenhof, Christoph Junghans, Jochen Hub, Peter Kasson, Carsten Kutzner, Brad Lambeth, Per Larsson, Justin A. Lemkul, Viveca Lindahl, Magnus Lundborg, Erik Marklund, Pieter Meulenhoff, Teemu Murtola, Szilárd Páll, Sander Pronk, Roland Schulz, Michael Shirts, Alfons Sijbers, Peter Tieleman, Christian Wennberg and Maarten Wolf. Mark Abraham, Berk Hess, David van der Spoel, and Erik Lindahl. c 1991–2000: Department of Biophysical Chemistry, University of Groningen. Nijenborgh 4, 9747 AG Groningen, The Netherlands. c 2001–2018: The GROMACS development teams at the Royal Institute of Technology and Uppsala University, Sweden. More information can be found on our website: www.gromacs.org. iv Preface & Disclaimer This manual is not complete and has no pretention to be so due to lack of time of the contributors – our first priority is to improve the software. It is worked on continuously, which in some cases might mean the information is not entirely correct. Comments on form and content are welcome, please send them to one of the mailing lists (see www.gromacs.org), or open an issue at redmine.gromacs.org. Corrections can also be made in the GROMACS git source repository and uploaded to gerrit.gromacs.org. We release an updated version of the manual whenever we release a new version of the software, so in general it is a good idea to use a manual with the same major and minor release number as your GROMACS installation. On-line Resources You can find more documentation and other material at our homepage www.gromacs.org. Among other things there is an on-line reference, several GROMACS mailing lists with archives and contributed topologies/force fields. Citation information When citing this document in any scientific publication please refer to it as: M.J. Abraham, D. van der Spoel, E. Lindahl, B. Hess, and the GROMACS development team, GROMACS User Manual version 2018.3, www.gromacs.org (2018) However, we prefer that you cite (some of) the GROMACS papers [1, 2, 3, 4, 5, 6, 7, 8] when you publish your results. Any future development depends on academic research grants, since the package is distributed as free software! GROMACS is Free Software The entire GROMACS package is available under the GNU Lesser General Public License (LGPL), version 2.1. This means it’s free as in free speech, not just that you can use it without paying us money. You can redistribute GROMACS and/or modify it under the terms of the LGPL as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. For details, check the COPYING file in the source code or consult http://www.gnu.org/licenses/old-licenses/lgpl-2.1.html. The GROMACS source code and and selected set of binary packages are available on our homepage, www.gromacs.org. Have fun. Contents 1 Introduction 1 1.1 Computational Chemistry and Molecular Modeling . . . . . . . . . . . . . . . . 1 1.2 Molecular Dynamics Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Energy Minimization and Search Methods . . . . . . . . . . . . . . . . . . . . . 5 2 Definitions and Units 7 2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 MD units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Reduced units . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Mixed or Double precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 Algorithms 11 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Periodic boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Some useful box types . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2.2 Cut-off restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.3 The group concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.1 Initial conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4.2 Neighbor searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.3 Compute forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.4 The leap-frog integrator . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4.5 The velocity Verlet integrator . . . . . . . . . . . . . . . . . . . . . . . 26 3.4.6 Understanding reversible integrators: The Trotter decomposition . . . . . 27 3.4.7 Multiple time stepping . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4.8 Temperature coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 vi Contents 3.4.9 Pressure coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.10 The complete update algorithm . . . . . . . . . . . . . . . . . . . . . . 42 3.4.11 Output step . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Shell molecular dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.1 Optimization of the shell positions . . . . . . . . . . . . . . . . . . . . . 44 Constraint algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6.1 SHAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.6.2 LINCS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.7 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.8 Stochastic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.9 Brownian Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.10 Energy Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.10.1 Steepest Descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.10.2 Conjugate Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.10.3 L-BFGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.11 Normal-Mode Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.12 Free energy calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.12.1 Slow-growth methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.12.2 Thermodynamic integration . . . . . . . . . . . . . . . . . . . . . . . . 54 3.13 Replica exchange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.14 Essential Dynamics sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.15 Expanded Ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.16 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.17 Domain decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.17.1 Coordinate and force communication . . . . . . . . . . . . . . . . . . . 58 3.17.2 Dynamic load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.17.3 Constraints in parallel . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.17.4 Interaction ranges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.17.5 Multiple-Program, Multiple-Data PME parallelization . . . . . . . . . . 61 3.17.6 Domain decomposition flow chart . . . . . . . . . . . . . . . . . . . . . 63 3.18 Implicit solvation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.5 3.6 4 Interaction function and force fields 4.1 Non-bonded interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 67 Contents 4.2 4.3 4.4 4.5 vii 4.1.1 The Lennard-Jones interaction . . . . . . . . . . . . . . . . . . . . . . . 68 4.1.2 Buckingham potential . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.1.3 Coulomb interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.1.4 Coulomb interaction with reaction field . . . . . . . . . . . . . . . . . . 70 4.1.5 Modified non-bonded interactions . . . . . . . . . . . . . . . . . . . . . 71 4.1.6 Modified short-range interactions with Ewald summation . . . . . . . . . 73 Bonded interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.2.1 Bond stretching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.2.2 Morse potential bond stretching . . . . . . . . . . . . . . . . . . . . . . 75 4.2.3 Cubic bond stretching potential . . . . . . . . . . . . . . . . . . . . . . 75 4.2.4 FENE bond stretching potential . . . . . . . . . . . . . . . . . . . . . . 76 4.2.5 Harmonic angle potential . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.2.6 Cosine based angle potential . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2.7 Restricted bending potential . . . . . . . . . . . . . . . . . . . . . . . . 78 4.2.8 Urey-Bradley potential . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.9 Bond-Bond cross term . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.10 Bond-Angle cross term . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.11 Quartic angle potential . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.2.12 Improper dihedrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.2.13 Proper dihedrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.2.14 Tabulated bonded interaction functions . . . . . . . . . . . . . . . . . . 84 Restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3.1 Position restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.3.2 Flat-bottomed position restraints . . . . . . . . . . . . . . . . . . . . . . 87 4.3.3 Angle restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.4 Dihedral restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.3.5 Distance restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 4.3.6 Orientation restraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.1 Simple polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.2 Anharmonic polarization . . . . . . . . . . . . . . . . . . . . . . . . . . 96 4.4.3 Water polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4.4.4 Thole polarization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Free energy interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 viii Contents 4.5.1 4.6 Soft-core interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.6.1 Exclusions and 1-4 Interactions. . . . . . . . . . . . . . . . . . . . . . . 102 4.6.2 Charge Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.6.3 Treatment of Cut-offs in the group scheme . . . . . . . . . . . . . . . . 103 4.7 Virtual interaction sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 4.8 Long Range Electrostatics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.9 4.8.1 Ewald summation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 4.8.2 PME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 4.8.3 P3M-AD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.8.4 Optimizing Fourier transforms and PME calculations . . . . . . . . . . . 110 Long Range Van der Waals interactions . . . . . . . . . . . . . . . . . . . . . . 110 4.9.1 Dispersion correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.9.2 Lennard-Jones PME . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 4.10 Force field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.10.1 GROMOS-96 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 4.10.2 OPLS/AA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.10.3 AMBER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 4.10.4 CHARMM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.10.5 Coarse-grained force fields . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.10.6 MARTINI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.10.7 PLUM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 5 Topologies 119 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.2 Particle type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.3 5.4 5.2.1 Atom types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 5.2.2 Virtual sites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Parameter files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.3.1 Atoms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.3.2 Non-bonded parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.3.3 Bonded parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Molecule definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.4.1 Moleculetype entries . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Contents ix 5.4.2 Intermolecular interactions . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.4.3 Intramolecular pair interactions . . . . . . . . . . . . . . . . . . . . . . 125 5.4.4 Exclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.5 Implicit solvation parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 5.6 Constraint algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.7 pdb2gmx input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 5.8 5.9 5.7.1 Residue database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.7.2 Residue to building block database . . . . . . . . . . . . . . . . . . . . . 131 5.7.3 Atom renaming database . . . . . . . . . . . . . . . . . . . . . . . . . . 131 5.7.4 Hydrogen database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.7.5 Termini database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.7.6 Virtual site database . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.7.7 Special bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 File formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.8.1 Topology file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.8.2 Molecule.itp file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5.8.3 Ifdef statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 5.8.4 Topologies for free energy calculations . . . . . . . . . . . . . . . . . . 148 5.8.5 Constraint forces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.8.6 Coordinate file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 Force field organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.9.1 Force-field files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.9.2 Changing force-field parameters . . . . . . . . . . . . . . . . . . . . . . 153 5.9.3 Adding atom types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 6 Special Topics 155 6.1 Free energy implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 6.2 Potential of mean force . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.3 Non-equilibrium pulling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.4 The pull code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 6.5 Adaptive biasing with AWH . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.5.1 Basics of the method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 6.5.2 The initial stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.5.3 Choice of target distribution . . . . . . . . . . . . . . . . . . . . . . . . 166 x Contents 6.6 6.5.4 Multiple independent or sharing biases . . . . . . . . . . . . . . . . . . 167 6.5.5 Reweighting and combining biased data . . . . . . . . . . . . . . . . . . 168 6.5.6 The friction metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 6.5.7 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 Enforced Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.6.1 Fixed Axis Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 6.6.2 Flexible Axis Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 6.6.3 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 6.7 Electric fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.8 Computational Electrophysiology . . . . . . . . . . . . . . . . . . . . . . . . . 182 6.8.1 6.9 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Calculating a PMF using the free-energy code . . . . . . . . . . . . . . . . . . . 185 6.10 Removing fastest degrees of freedom . . . . . . . . . . . . . . . . . . . . . . . . 186 6.10.1 Hydrogen bond-angle vibrations . . . . . . . . . . . . . . . . . . . . . . 187 6.10.2 Out-of-plane vibrations in aromatic groups . . . . . . . . . . . . . . . . 189 6.11 Viscosity calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.12 Tabulated interaction functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.12.1 Cubic splines for potentials . . . . . . . . . . . . . . . . . . . . . . . . . 191 6.12.2 User-specified potential functions . . . . . . . . . . . . . . . . . . . . . 192 6.13 Mixed Quantum-Classical simulation techniques . . . . . . . . . . . . . . . . . 193 6.13.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 6.13.2 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 6.13.3 Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196 6.13.4 Future developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 6.14 Using VMD plug-ins for trajectory file I/O . . . . . . . . . . . . . . . . . . . . . 197 6.15 Interactive Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . 197 6.15.1 Simulation input preparation . . . . . . . . . . . . . . . . . . . . . . . . 197 6.15.2 Starting the simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 6.15.3 Connecting from VMD . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 6.16 Embedding proteins into the membranes . . . . . . . . . . . . . . . . . . . . . . 198 7 Run parameters and Programs 201 7.1 Online documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 7.2 File types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Contents 7.3 xi Run Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 8 Analysis 8.1 203 Using Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 8.1.1 Default Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 8.1.2 Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8.2 Looking at your trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 8.3 General properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 8.4 Radial distribution functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 8.5 Correlation functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 8.6 8.5.1 Theory of correlation functions . . . . . . . . . . . . . . . . . . . . . . 210 8.5.2 Using FFT for computation of the ACF . . . . . . . . . . . . . . . . . . 211 8.5.3 Special forms of the ACF . . . . . . . . . . . . . . . . . . . . . . . . . . 211 8.5.4 Some Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 Curve fitting in GROMACS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.6.1 Sum of exponential functions . . . . . . . . . . . . . . . . . . . . . . . 212 8.6.2 Error estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.6.3 Interphase boundary demarcation . . . . . . . . . . . . . . . . . . . . . 213 8.6.4 Transverse current autocorrelation function . . . . . . . . . . . . . . . . 213 8.6.5 Viscosity estimation from pressure autocorrelation function . . . . . . . 213 8.7 Mean Square Displacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214 8.8 Bonds/distances, angles and dihedrals . . . . . . . . . . . . . . . . . . . . . . . 214 8.9 Radius of gyration and distances . . . . . . . . . . . . . . . . . . . . . . . . . . 216 8.10 Root mean square deviations in structure . . . . . . . . . . . . . . . . . . . . . . 217 8.11 Covariance analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218 8.12 Dihedral principal component analysis . . . . . . . . . . . . . . . . . . . . . . . 220 8.13 Hydrogen bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 8.14 Protein-related items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 8.15 Interface-related items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222 A Some implementation details 227 A.1 Single Sum Virial in GROMACS . . . . . . . . . . . . . . . . . . . . . . . . . . 227 A.1.1 Virial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 A.1.2 Virial from non-bonded forces . . . . . . . . . . . . . . . . . . . . . . . 228 A.1.3 The intra-molecular shift (mol-shift) . . . . . . . . . . . . . . . . . . . . 229 xii Contents A.1.4 Virial from Covalent Bonds . . . . . . . . . . . . . . . . . . . . . . . . 230 A.1.5 Virial from SHAKE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 A.2 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 A.2.1 Inner Loops for Water . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 B Averages and fluctuations 233 B.1 Formulae for averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 B.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 B.2.1 Part of a Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 B.2.2 Combining two simulations . . . . . . . . . . . . . . . . . . . . . . . . 235 B.2.3 Summing energy terms . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Bibliography 239 Index 253 Chapter 1 Introduction 1.1 Computational Chemistry and Molecular Modeling GROMACS is an engine to perform molecular dynamics simulations and energy minimization. These are two of the many techniques that belong to the realm of computational chemistry and molecular modeling. Computational chemistry is just a name to indicate the use of computational techniques in chemistry, ranging from quantum mechanics of molecules to dynamics of large complex molecular aggregates. Molecular modeling indicates the general process of describing complex chemical systems in terms of a realistic atomic model, with the goal being to understand and predict macroscopic properties based on detailed knowledge on an atomic scale. Often, molecular modeling is used to design new materials, for which the accurate prediction of physical properties of realistic systems is required. Macroscopic physical properties can be distinguished by (a) static equilibrium properties, such as the binding constant of an inhibitor to an enzyme, the average potential energy of a system, or the radial distribution function of a liquid, and (b) dynamic or non-equilibrium properties, such as the viscosity of a liquid, diffusion processes in membranes, the dynamics of phase changes, reaction kinetics, or the dynamics of defects in crystals. The choice of technique depends on the question asked and on the feasibility of the method to yield reliable results at the present state of the art. Ideally, the (relativistic) time-dependent Schrödinger equation describes the properties of molecular systems with high accuracy, but anything more complex than the equilibrium state of a few atoms cannot be handled at this ab initio level. Thus, approximations are necessary; the higher the complexity of a system and the longer the time span of the processes of interest is, the more severe the required approximations are. At a certain point (reached very much earlier than one would wish), the ab initio approach must be augmented or replaced by empirical parameterization of the model used. Where simulations based on physical principles of atomic interactions still fail due to the complexity of the system, molecular modeling is based entirely on a similarity analysis of known structural and chemical data. The QSAR methods (Quantitative StructureActivity Relations) and many homology-based protein structure predictions belong to the latter category. Macroscopic properties are always ensemble averages over a representative statistical ensemble 2 Chapter 1. Introduction (either equilibrium or non-equilibrium) of molecular systems. For molecular modeling, this has two important consequences: • The knowledge of a single structure, even if it is the structure of the global energy minimum, is not sufficient. It is necessary to generate a representative ensemble at a given temperature, in order to compute macroscopic properties. But this is not enough to compute thermodynamic equilibrium properties that are based on free energies, such as phase equilibria, binding constants, solubilities, relative stability of molecular conformations, etc. The computation of free energies and thermodynamic potentials requires special extensions of molecular simulation techniques. • While molecular simulations, in principle, provide atomic details of the structures and motions, such details are often not relevant for the macroscopic properties of interest. This opens the way to simplify the description of interactions and average over irrelevant details. The science of statistical mechanics provides the theoretical framework for such simplifications. There is a hierarchy of methods ranging from considering groups of atoms as one unit, describing motion in a reduced number of collective coordinates, averaging over solvent molecules with potentials of mean force combined with stochastic dynamics [9], to mesoscopic dynamics describing densities rather than atoms and fluxes as response to thermodynamic gradients rather than velocities or accelerations as response to forces [10]. For the generation of a representative equilibrium ensemble two methods are available: (a) Monte Carlo simulations and (b) Molecular Dynamics simulations. For the generation of non-equilibrium ensembles and for the analysis of dynamic events, only the second method is appropriate. While Monte Carlo simulations are more simple than MD (they do not require the computation of forces), they do not yield significantly better statistics than MD in a given amount of computer time. Therefore, MD is the more universal technique. If a starting configuration is very far from equilibrium, the forces may be excessively large and the MD simulation may fail. In those cases, a robust energy minimization is required. Another reason to perform an energy minimization is the removal of all kinetic energy from the system: if several “snapshots” from dynamic simulations must be compared, energy minimization reduces the thermal noise in the structures and potential energies so that they can be compared better. 1.2 Molecular Dynamics Simulations MD simulations solve Newton’s equations of motion for a system of N interacting atoms: mi ∂ 2 ri = F i , i = 1 . . . N. ∂t2 (1.1) The forces are the negative derivatives of a potential function V (r 1 , r 2 , . . . , r N ): Fi = − ∂V ∂r i (1.2) The equations are solved simultaneously in small time steps. The system is followed for some time, taking care that the temperature and pressure remain at the required values, and the coordinates are written to an output file at regular intervals. The coordinates as a function of time 1.2. Molecular Dynamics Simulations type of bond C-H, O-H, N-H C=C, C=O HOH C-C H2 CX CCC O-H· · ·O O-H· · ·O 3 type of vibration stretch stretch bending stretch sciss, rock bending libration stretch wavenumber (cm−1 ) 3000–3500 1700–2000 1600 1400–1600 1000–1500 800–1000 400– 700 50– 200 Table 1.1: Typical vibrational frequencies (wavenumbers) in molecules and hydrogen-bonded liquids. Compare kT /h = 200 cm−1 at 300 K. represent a trajectory of the system. After initial changes, the system will usually reach an equilibrium state. By averaging over an equilibrium trajectory, many macroscopic properties can be extracted from the output file. It is useful at this point to consider the limitations of MD simulations. The user should be aware of those limitations and always perform checks on known experimental properties to assess the accuracy of the simulation. We list the approximations below. The simulations are classical Using Newton’s equation of motion automatically implies the use of classical mechanics to describe the motion of atoms. This is all right for most atoms at normal temperatures, but there are exceptions. Hydrogen atoms are quite light and the motion of protons is sometimes of essential quantum mechanical character. For example, a proton may tunnel through a potential barrier in the course of a transfer over a hydrogen bond. Such processes cannot be properly treated by classical dynamics! Helium liquid at low temperature is another example where classical mechanics breaks down. While helium may not deeply concern us, the high frequency vibrations of covalent bonds should make us worry! The statistical mechanics of a classical harmonic oscillator differs appreciably from that of a real quantum oscillator when the resonance frequency ν approximates or exceeds kB T /h. Now at room temperature the wavenumber σ = 1/λ = ν/c at which hν = kB T is approximately 200 cm−1 . Thus, all frequencies higher than, say, 100 cm−1 may misbehave in classical simulations. This means that practically all bond and bond-angle vibrations are suspect, and even hydrogen-bonded motions as translational or librational H-bond vibrations are beyond the classical limit (see Table 1.1). What can we do? Well, apart from real quantum-dynamical simulations, we can do one of two things: (a) If we perform MD simulations using harmonic oscillators for bonds, we should make corrections to the total internal energy U = Ekin +Epot and specific heat CV (and to entropy S and free energy A or G if those are calculated). The corrections to the energy and specific heat of a one-dimensional oscillator with frequency ν are: [11] U QM = U cl + kT 1 x x−1+ x 2 e −1 (1.3) 4 Chapter 1. Introduction ! CVQM = CVcl +k x2 e x −1 , (ex − 1)2 (1.4) where x = hν/kT . The classical oscillator absorbs too much energy (kT ), while the highfrequency quantum oscillator is in its ground state at the zero-point energy level of 12 hν. (b) We can treat the bonds (and bond angles) as constraints in the equations of motion. The rationale behind this is that a quantum oscillator in its ground state resembles a constrained bond more closely than a classical oscillator. A good practical reason for this choice is that the algorithm can use larger time steps when the highest frequencies are removed. In practice the time step can be made four times as large when bonds are constrained than when they are oscillators [12]. GROMACS has this option for the bonds and bond angles. The flexibility of the latter is rather essential to allow for the realistic motion and coverage of configurational space [13]. Electrons are in the ground state In MD we use a conservative force field that is a function of the positions of atoms only. This means that the electronic motions are not considered: the electrons are supposed to adjust their dynamics instantly when the atomic positions change (the Born-Oppenheimer approximation), and remain in their ground state. This is really all right, almost always. But of course, electron transfer processes and electronically excited states can not be treated. Neither can chemical reactions be treated properly, but there are other reasons to shy away from reactions for the time being. Force fields are approximate Force fields provide the forces. They are not really a part of the simulation method and their parameters can be modified by the user as the need arises or knowledge improves. But the form of the forces that can be used in a particular program is subject to limitations. The force field that is incorporated in GROMACS is described in Chapter 4. In the present version the force field is pair-additive (apart from long-range Coulomb forces), it cannot incorporate polarizabilities, and it does not contain fine-tuning of bonded interactions. This urges the inclusion of some limitations in this list below. For the rest it is quite useful and fairly reliable for biologically-relevant macromolecules in aqueous solution! The force field is pair-additive This means that all non-bonded forces result from the sum of non-bonded pair interactions. Non pair-additive interactions, the most important example of which is interaction through atomic polarizability, are represented by effective pair potentials. Only average non pairadditive contributions are incorporated. This also means that the pair interactions are not pure, i.e., they are not valid for isolated pairs or for situations that differ appreciably from the test systems on which the models were parameterized. In fact, the effective pair potentials are not that bad in practice. But the omission of polarizability also means that electrons in atoms do not provide a dielectric constant as they should. For example, real liquid alkanes have a dielectric constant of slightly more than 2, which reduce the long-range electrostatic interaction between (partial) charges. Thus, the simulations will exaggerate the long-range Coulomb terms. Luckily, the next item compensates this effect a bit. Long-range interactions are cut off In this version, GROMACS always uses a cut-off radius for the Lennard-Jones interactions 1.3. Energy Minimization and Search Methods 5 and sometimes for the Coulomb interactions as well. The “minimum-image convention” used by GROMACS requires that only one image of each particle in the periodic boundary conditions is considered for a pair interaction, so the cut-off radius cannot exceed half the box size. That is still pretty big for large systems, and trouble is only expected for systems containing charged particles. But then truly bad things can happen, like accumulation of charges at the cut-off boundary or very wrong energies! For such systems, you should consider using one of the implemented long-range electrostatic algorithms, such as particlemesh Ewald [14, 15]. Boundary conditions are unnatural Since system size is small (even 10,000 particles is small), a cluster of particles will have a lot of unwanted boundary with its environment (vacuum). We must avoid this condition if we wish to simulate a bulk system. As such, we use periodic boundary conditions to avoid real phase boundaries. Since liquids are not crystals, something unnatural remains. This item is mentioned last because it is the least of the evils. For large systems, the errors are small, but for small systems with a lot of internal spatial correlation, the periodic boundaries may enhance internal correlation. In that case, beware of, and test, the influence of system size. This is especially important when using lattice sums for long-range electrostatics, since these are known to sometimes introduce extra ordering. 1.3 Energy Minimization and Search Methods As mentioned in sec. 1.1, in many cases energy minimization is required. GROMACS provides a number of methods for local energy minimization, as detailed in sec. 3.10. The potential energy function of a (macro)molecular system is a very complex landscape (or hypersurface) in a large number of dimensions. It has one deepest point, the global minimum and a very large number of local minima, where all derivatives of the potential energy function with respect to the coordinates are zero and all second derivatives are non-negative. The matrix of second derivatives, which is called the Hessian matrix, has non-negative eigenvalues; only the collective coordinates that correspond to translation and rotation (for an isolated molecule) have zero eigenvalues. In between the local minima there are saddle points, where the Hessian matrix has only one negative eigenvalue. These points are the mountain passes through which the system can migrate from one local minimum to another. Knowledge of all local minima, including the global one, and of all saddle points would enable us to describe the relevant structures and conformations and their free energies, as well as the dynamics of structural transitions. Unfortunately, the dimensionality of the configurational space and the number of local minima is so high that it is impossible to sample the space at a sufficient number of points to obtain a complete survey. In particular, no minimization method exists that guarantees the determination of the global minimum in any practical amount of time. Impractical methods exist, some much faster than others [16]. However, given a starting configuration, it is possible to find the nearest local minimum. “Nearest” in this context does not always imply “nearest” in a geometrical sense (i.e., the least sum of square coordinate differences), but means the minimum that can be reached by systematically moving down the steepest local gradient. Finding this nearest local minimum is all that GROMACS can do for you, sorry! If you want to find other 6 Chapter 1. Introduction minima and hope to discover the global minimum in the process, the best advice is to experiment with temperature-coupled MD: run your system at a high temperature for a while and then quench it slowly down to the required temperature; do this repeatedly! If something as a melting or glass transition temperature exists, it is wise to stay for some time slightly below that temperature and cool down slowly according to some clever scheme, a process called simulated annealing. Since no physical truth is required, you can use your imagination to speed up this process. One trick that often works is to make hydrogen atoms heavier (mass 10 or so): although that will slow down the otherwise very rapid motions of hydrogen atoms, it will hardly influence the slower motions in the system, while enabling you to increase the time step by a factor of 3 or 4. You can also modify the potential energy function during the search procedure, e.g. by removing barriers (remove dihedral angle functions or replace repulsive potentials by soft-core potentials [17]), but always take care to restore the correct functions slowly. The best search method that allows rather drastic structural changes is to allow excursions into four-dimensional space [18], but this requires some extra programming beyond the standard capabilities of GROMACS. Three possible energy minimization methods are: • Those that require only function evaluations. Examples are the simplex method and its variants. A step is made on the basis of the results of previous evaluations. If derivative information is available, such methods are inferior to those that use this information. • Those that use derivative information. Since the partial derivatives of the potential energy with respect to all coordinates are known in MD programs (these are equal to minus the forces) this class of methods is very suitable as modification of MD programs. • Those that use second derivative information as well. These methods are superior in their convergence properties near the minimum: a quadratic potential function is minimized in one step! The problem is that for N particles a 3N × 3N matrix must be computed, stored, and inverted. Apart from the extra programming to obtain second derivatives, for most systems of interest this is beyond the available capacity. There are intermediate methods that build up the Hessian matrix on the fly, but they also suffer from excessive storage requirements. So GROMACS will shy away from this class of methods. The steepest descent method, available in GROMACS, is of the second class. It simply takes a step in the direction of the negative gradient (hence in the direction of the force), without any consideration of the history built up in previous steps. The step size is adjusted such that the search is fast, but the motion is always downhill. This is a simple and sturdy, but somewhat stupid, method: its convergence can be quite slow, especially in the vicinity of the local minimum! The faster-converging conjugate gradient method (see e.g. [19]) uses gradient information from previous steps. In general, steepest descents will bring you close to the nearest local minimum very quickly, while conjugate gradients brings you very close to the local minimum, but performs worse far away from the minimum. GROMACS also supports the L-BFGS minimizer, which is mostly comparable to conjugate gradient method, but in some cases converges faster. Chapter 2 Definitions and Units 2.1 Notation The following conventions for mathematical typesetting are used throughout this document: Item Notation Example Vector Bold italic ri Vector Length Italic ri We define the lowercase subscripts i, j, k and l to denote particles: r i is the position vector of particle i, and using this notation: r ij = r j − r i (2.1) rij = |r ij | (2.2) The force on particle i is denoted by F i and F ij = force on i exerted by j (2.3) Please note that we changed notation as of version 2.0 to r ij = r j − r i since this is the notation commonly used. If you encounter an error, let us know. 2.2 MD units GROMACS uses a consistent set of units that produce values in the vicinity of unity for most relevant molecular quantities. Let us call them MD units. The basic units in this system are nm, ps, K, electron charge (e) and atomic mass unit (u), see Table 2.1. The values used in GROMACS are taken from the CODATA Internationally recommended 2010 values of fundamental physical constants (see http://nist.gov). Consistent with these units are a set of derived units, given in Table 2.2. The electric conversion factor f = 1 4πεo = 138.935 458 kJ mol−1 nm e−2 . It relates the mechan- 8 Chapter 2. Definitions and Units Quantity length mass time charge temperature Symbol r m t q T Unit nm = 10−9 m u (unified atomic mass unit) = 1.660 538 921 × 10−27 kg ps = 10−12 s e = elementary charge = 1.602 176 565(×10−19 C K Table 2.1: Basic units used in GROMACS. Quantity energy Force pressure velocity dipole moment electric potential electric field Symbol E, V F p v µ Φ E Unit kJ mol−1 kJ mol−1 nm−1 bar nm ps−1 = 1000 m s−1 e nm kJ mol−1 e−1 = 0.010 364 269 19 Volt kJ mol−1 nm−1 e−1 = 1.036 426 919 × 107 V m−1 Table 2.2: Derived units. Note that an additional conversion factor of 1028 a.m.u (≈16.6) is applied to get bar instead of internal MD units in the energy and log files. ical quantities to the electrical quantities as in V =f q2 q2 or F = f 2 r r (2.4) Electric potentials Φ and electric fields E are intermediate quantities in the calculation of energies and forces. They do not occur inside GROMACS. If they are used in evaluations, there is a choice of equations and related units. We strongly recommend following the usual practice of including the factor f in expressions that evaluate Φ and E: Φ(r) = f j E(r) = f X j qj |r − r j | (2.5) (r − r j ) |r − r j |3 (2.6) X qj With these definitions, qΦ is an energy and qE is a force. The units are those given in Table 2.2: about 10 mV for potential. Thus, the potential of an electronic charge at a distance of 1 nm equals f ≈ 140 units ≈ 1.4 V. (exact value: 1.439 964 5 V) Note that these units are mutually consistent; changing any of the units is likely to produce inconsistencies and is therefore strongly discouraged! In particular: if Å are used instead of nm, the unit of time changes to 0.1 ps. If kcal mol−1 (= 4.184 kJ mol−1 ) is used instead of kJ mol−1 for energy, the unit of time becomes 0.488882 ps and the unit of temperature changes to 4.184 K. But in both cases all electrical energies go wrong, because they will still be computed in kJ mol−1 , expecting nm as the unit of length. Although careful rescaling of charges may still yield consistency, it is clear that such confusions must be rigidly avoided. 2.3. Reduced units Symbol NAV R kB h h̄ c 9 Name Avogadro’s number gas constant Boltzmann’s constant Planck’s constant Dirac’s constant velocity of light Value 6.022 141 29 × 1023 mol−1 8.314 462 1 × 10−3 kJ mol−1 K−1 idem 0.399 031 271 kJ mol−1 ps 0.063 507 799 3 kJ mol−1 ps 299 792.458 nm ps−1 Table 2.3: Some Physical Constants Quantity Length Mass Time Temperature Energy Force Pressure Velocity Density Symbol r∗ m∗ t∗ T∗ E∗ F∗ P∗ v∗ ρ∗ Relation to SI r σ −1 m M−1p t σ −1 /M kB T −1 E −1 F σ −1 Pp σ 3 −1 v M/ N σ 3 V −1 Table 2.4: Reduced Lennard-Jones quantities In terms of the MD units, the usual physical constants take on different values (see Table 2.3). All quantities are per mol rather than per molecule. There is no distinction between Boltzmann’s constant k and the gas constant R: their value is 0.008 314 462 1 kJ mol−1 K−1 . 2.3 Reduced units When simulating Lennard-Jones (LJ) systems, it might be advantageous to use reduced units (i.e., setting ii = σii = mi = kB = 1 for one type of atoms). This is possible. When specifying the input in reduced units, the output will also be in reduced units. The one exception is the temperature, which is expressed in 0.008 314 462 1 reduced units. This is a consequence of using Boltzmann’s constant in the evaluation of temperature in the code. Thus not T , but kB T , is the reduced temperature. A GROMACS temperature T = 1 means a reduced temperature of 0.008 . . . units; if a reduced temperature of 1 is required, the GROMACS temperature should be 120.272 36. In Table 2.4 quantities are given for LJ potentials: 6 # " 12 VLJ = 4 σ r − σ r (2.7) 10 Chapter 2. Definitions and Units 2.4 Mixed or Double precision GROMACS can be compiled in either mixed or double precision. Documentation of previous GROMACS versions referred to “single precision”, but the implementation has made selective use of double precision for many years. Using single precision for all variables would lead to a significant reduction in accuracy. Although in “mixed precision” all state vectors, i.e. particle coordinates, velocities and forces, are stored in single precision, critical variables are double precision. A typical example of the latter is the virial, which is a sum over all forces in the system, which have varying signs. In addition, in many parts of the code we managed to avoid double precision for arithmetic, by paying attention to summation order or reorganization of mathematical expressions. The default configuration uses mixed precision, but it is easy to turn on double precision by adding the option -DGMX_DOUBLE=on to cmake. Double precision will be 20 to 100% slower than mixed precision depending on the architecture you are running on. Double precision will use somewhat more memory and run input, energy and full-precision trajectory files will be almost twice as large. The energies in mixed precision are accurate up to the last decimal, the last one or two decimals of the forces are non-significant. The virial is less accurate than the forces, since the virial is only one order of magnitude larger than the size of each element in the sum over all atoms (sec. A.1). In most cases this is not really a problem, since the fluctuations in the virial can be two orders of magnitude larger than the average. Using cut-offs for the Coulomb interactions cause large errors in the energies, forces, and virial. Even when using a reaction-field or lattice sum method, the errors are larger than, or comparable to, the errors due to the partial use of single precision. Since MD is chaotic, trajectories with very similar starting conditions will diverge rapidly, the divergence is faster in mixed precision than in double precision. For most simulations, mixed precision is accurate enough. In some cases double precision is required to get reasonable results: • normal mode analysis, for the conjugate gradient or l-bfgs minimization and the calculation and diagonalization of the Hessian • long-term energy conservation, especially for large systems Chapter 3 Algorithms 3.1 Introduction In this chapter we first give describe some general concepts used in GROMACS: periodic boundary conditions (sec. 3.2) and the group concept (sec. 3.3). The MD algorithm is described in sec. 3.4: first a global form of the algorithm is given, which is refined in subsequent subsections. The (simple) EM (Energy Minimization) algorithm is described in sec. 3.10. Some other algorithms for special purpose dynamics are described after this. A few issues are of general interest. In all cases the system must be defined, consisting of molecules. Molecules again consist of particles with defined interaction functions. The detailed description of the topology of the molecules and of the force field and the calculation of forces is given in chapter 4. In the present chapter we describe other aspects of the algorithm, such as pair list generation, update of velocities and positions, coupling to external temperature and pressure, conservation of constraints. The analysis of the data generated by an MD simulation is treated in chapter 8. 3.2 Periodic boundary conditions The classical way to minimize edge effects in a finite system is to apply periodic boundary conditions. The atoms of the system to be simulated are put into a space-filling box, which is surrounded by translated copies of itself (Fig. 3.1). Thus there are no boundaries of the system; the artifact caused by unwanted boundaries in an isolated cluster is now replaced by the artifact of periodic conditions. If the system is crystalline, such boundary conditions are desired (although motions are naturally restricted to periodic motions with wavelengths fitting into the box). If one wishes to simulate non-periodic systems, such as liquids or solutions, the periodicity by itself causes errors. The errors can be evaluated by comparing various system sizes; they are expected to be less severe than the errors resulting from an unnatural boundary with vacuum. There are several possible shapes for space-filling unit cells. Some, like the rhombic dodecahedron and the truncated octahedron [20] are closer to being a sphere than a cube is, and are therefore 12 Chapter 3. Algorithms y j’ j’ i’ j’ i’ j’ i’ j’ j i’ i j’ i’ j’ i’ x j’ i’ i’ y j’ j’ i’ i’ j’ i’ j i’ j’ i j’ i’ j’ i’ j’ i’ j’ x i’ Figure 3.1: Periodic boundary conditions in two dimensions. better suited to the study of an approximately spherical macromolecule in solution, since fewer solvent molecules are required to fill the box given a minimum distance between macromolecular images. At the same time, rhombic dodecahedra and truncated octahedra are special cases of triclinic unit cells; the most general space-filling unit cells that comprise all possible space-filling shapes [21]. For this reason, GROMACS is based on the triclinic unit cell. GROMACS uses periodic boundary conditions, combined with the minimum image convention: only one – the nearest – image of each particle is considered for short-range non-bonded interaction terms. For long-range electrostatic interactions this is not always accurate enough, and GROMACS therefore also incorporates lattice sum methods such as Ewald Sum, PME and PPPM. GROMACS supports triclinic boxes of any shape. The simulation box (unit cell) is defined by the 3 box vectors a,b and c. The box vectors must satisfy the following conditions: ay = az = bz = 0 ax > 0, by > 0, cz > 0 (3.1) (3.2) 1 1 1 ax , |cx | ≤ ax , |cy | ≤ by (3.3) 2 2 2 Equations 3.1 can always be satisfied by rotating the box. Inequalities (3.2) and (3.3) can always be satisfied by adding and subtracting box vectors. |bx | ≤ Even when simulating using a triclinic box, GROMACS always keeps the particles in a brickshaped volume for efficiency, as illustrated in Fig. 3.1 for a 2-dimensional system. Therefore, from the output trajectory it might seem that the simulation was done in a rectangular box. The program trjconv can be used to convert the trajectory to a different unit-cell representation. 3.2. Periodic boundary conditions 13 Figure 3.2: A rhombic dodecahedron and truncated octahedron (arbitrary orientations). box type cubic rhombic dodecahedron (xy-square) rhombic dodecahedron (xy-hexagon) truncated octahedron image distance box volume d d3 d d d 1 2 √ 2 d3 0.707 d3 1 2 √ 2 d3 0.707 d3 4 9 √ 3 d3 0.770 d3 a d 0 0 d 0 0 d 0 0 d 0 0 box vectors b c 0 0 d 0 0 d 1 0 2d 1 d 2d √ 1 0 2 2d 1 1 d 2 2d √ √ 1 1 2 3d 6 √3 d 1 0 3 6d 1 −√31 d 3d √ 2 1 3 2d 3 √2 d 1 0 3 6d 6 box vector angles 6 ac 6 ab bc 90◦ 90◦ 90◦ 60◦ 60◦ 90◦ 60◦ 60◦ 60◦ 71.53◦ 109.47◦ 71.53◦ Table 3.1: The cubic box, the rhombic dodecahedron and the truncated octahedron. It is also possible to simulate without periodic boundary conditions, but it is usually more efficient to simulate an isolated cluster of molecules in a large periodic box, since fast grid searching can only be used in a periodic system. 3.2.1 Some useful box types The three most useful box types for simulations of solvated systems are described in Table 3.1. The rhombic dodecahedron (Fig. 3.2) is the smallest and most regular space-filling unit cell. Each of the 12 image cells is at the same distance. The volume is 71% of the volume of a cube having the same image distance. This saves about 29% of CPU-time when simulating a spherical or flexible molecule in solvent. There are two different orientations of a rhombic dodecahedron that satisfy equations 3.1, 3.2 and 3.3. The program editconf produces the orientation which has a square intersection with the xy-plane. This orientation was chosen because the first two box vectors coincide with the x and y-axis, which is easier to comprehend. The other orientation can 14 Chapter 3. Algorithms be useful for simulations of membrane proteins. In this case the cross-section with the xy-plane is a hexagon, which has an area which is 14% smaller than the area of a square with the same image distance. The height of the box (cz ) should be changed to obtain an optimal spacing. This box shape not only saves CPU time, it also results in a more uniform arrangement of the proteins. 3.2.2 Cut-off restrictions The minimum image convention implies that the cut-off radius used to truncate non-bonded interactions may not exceed half the shortest box vector: Rc < 1 min(kak, kbk, kck), 2 (3.4) because otherwise more than one image would be within the cut-off distance of the force. When a macromolecule, such as a protein, is studied in solution, this restriction alone is not sufficient: in principle, a single solvent molecule should not be able to ‘see’ both sides of the macromolecule. This means that the length of each box vector must exceed the length of the macromolecule in the direction of that edge plus two times the cut-off radius Rc . It is, however, common to compromise in this respect, and make the solvent layer somewhat smaller in order to reduce the computational cost. For efficiency reasons the cut-off with triclinic boxes is more restricted. For grid search the extra restriction is weak: Rc < min(ax , by , cz ) (3.5) For simple search the extra restriction is stronger: Rc < 1 min(ax , by , cz ) 2 (3.6) Each unit cell (cubic, rectangular or triclinic) is surrounded by 26 translated images. A particular image can therefore always be identified by an index pointing to one of 27 translation vectors and constructed by applying a translation with the indexed vector (see 3.4.3). Restriction (3.5) ensures that only 26 images need to be considered. 3.3 The group concept The GROMACS MD and analysis programs use user-defined groups of atoms to perform certain actions on. The maximum number of groups is 256, but each atom can only belong to six different groups, one each of the following: temperature-coupling group The temperature coupling parameters (reference temperature, time constant, number of degrees of freedom, see 3.4.4) can be defined for each T-coupling group separately. For example, in a solvated macromolecule the solvent (that tends to generate more heating by force and integration errors) can be coupled with a shorter time constant to a bath than is a macromolecule, or a surface can be kept cooler than an adsorbing molecule. Many different T-coupling groups may be defined. See also center of mass groups below. 3.4. Molecular Dynamics 15 freeze group Atoms that belong to a freeze group are kept stationary in the dynamics. This is useful during equilibration, e.g. to avoid badly placed solvent molecules giving unreasonable kicks to protein atoms, although the same effect can also be obtained by putting a restraining potential on the atoms that must be protected. The freeze option can be used, if desired, on just one or two coordinates of an atom, thereby freezing the atoms in a plane or on a line. When an atom is partially frozen, constraints will still be able to move it, even in a frozen direction. A fully frozen atom can not be moved by constraints. Many freeze groups can be defined. Frozen coordinates are unaffected by pressure scaling; in some cases this can produce unwanted results, particularly when constraints are also used (in this case you will get very large pressures). Accordingly, it is recommended to avoid combining freeze groups with constraints and pressure coupling. For the sake of equilibration it could suffice to start with freezing in a constant volume simulation, and afterward use position restraints in conjunction with constant pressure. accelerate group On each atom in an “accelerate group” an acceleration ag is imposed. This is equivalent to an external force. This feature makes it possible to drive the system into a non-equilibrium state and enables the performance of non-equilibrium MD and hence to obtain transport properties. energy-monitor group Mutual interactions between all energy-monitor groups are compiled during the simulation. This is done separately for Lennard-Jones and Coulomb terms. In principle up to 256 groups could be defined, but that would lead to 256×256 items! Better use this concept sparingly. All non-bonded interactions between pairs of energy-monitor groups can be excluded (see details in the User Guide). Pairs of particles from excluded pairs of energy-monitor groups are not put into the pair list. This can result in a significant speedup for simulations where interactions within or between parts of the system are not required. center of mass group In GROMACS the center of mass (COM) motion can be removed, for either the complete system or for groups of atoms. The latter is useful, e.g. for systems where there is limited friction (e.g. gas systems) to prevent center of mass motion to occur. It makes sense to use the same groups for temperature coupling and center of mass motion removal. Compressed position output group In order to further reduce the size of the compressed trajectory file (.xtc or .tng), it is possible to store only a subset of all particles. All xcompression groups that are specified are saved, the rest are not. If no such groups are specified, than all atoms are saved to the compressed trajectory file. The use of groups in GROMACS tools is described in sec. 8.1. 3.4 Molecular Dynamics A global flow scheme for MD is given in Fig. 3.3. Each MD or EM run requires as input a set of initial coordinates and – optionally – initial velocities of all particles involved. This chapter does not describe how these are obtained; for the setup of an actual MD run check the online manual at www.gromacs.org. 16 Chapter 3. Algorithms THE GLOBAL MD ALGORITHM 1. Input initial conditions Potential interaction V as a function of atom positions Positions r of all atoms in the system Velocities v of all atoms in the system ⇓ repeat 2,3,4 for the required number of steps: 2. Compute forces The force on any atom ∂V Fi = − ∂r i is computed by calculating the force between non-bonded atom pairs: P F i = j F ij plus the forces due to bonded interactions (which may depend on 1, 2, 3, or 4 atoms), plus restraining and/or external forces. The potential and kinetic energies and the pressure tensor may be computed. ⇓ 3. Update configuration The movement of the atoms is simulated by numerically solving Newton’s equations of motion d2 r i Fi = 2 dt mi or dr i dv i Fi = vi; = dt dt mi ⇓ 4. if required: Output step write positions, velocities, energies, temperature, pressure, etc. Figure 3.3: The global MD algorithm 3.4. Molecular Dynamics 17 Velocity Figure 3.4: A Maxwell-Boltzmann velocity distribution, generated from random numbers. 3.4.1 Initial conditions Topology and force field The system topology, including a description of the force field, must be read in. Force fields and topologies are described in chapter 4 and 5, respectively. All this information is static; it is never modified during the run. Coordinates and velocities Then, before a run starts, the box size and the coordinates and velocities of all particles are required. The box size and shape is determined by three vectors (nine numbers) b1 , b2 , b3 , which represent the three basis vectors of the periodic box. If the run starts at t = t0 , the coordinates at t = t0 must be known. The leap-frog algorithm, the default algorithm used to update the time step with ∆t (see 3.4.4), also requires that the velocities at t = t0 − 21 ∆t are known. If velocities are not available, the program can generate initial atomic velocities vi , i = 1 . . . 3N with a (Fig. 3.4) at a given absolute temperature T : r p(vi ) = mi mi vi2 exp − 2πkT 2kT ! (3.7) 18 Chapter 3. Algorithms where k is Boltzmann’s constant (see chapter 2). To accomplish this, normally distributed random numbers are generated by adding twelve random numbers Rk in the range 0 ≤ Rk < 1 and subtracting 6.0 fromptheir sum. The result is then multiplied by the standard deviation of the velocity distribution kT /mi . Since the resulting total energy will not correspond exactly to the required temperature T , a correction is made: first the center-of-mass motion is removed and then all velocities are scaled so that the total energy corresponds exactly to T (see eqn. 3.18). Center-of-mass motion The center-of-mass velocity is normally set to zero at every step; there is (usually) no net external force acting on the system and the center-of-mass velocity should remain constant. In practice, however, the update algorithm introduces a very slow change in the center-of-mass velocity, and therefore in the total kinetic energy of the system – especially when temperature coupling is used. If such changes are not quenched, an appreciable center-of-mass motion can develop in long runs, and the temperature will be significantly misinterpreted. Something similar may happen due to overall rotational motion, but only when an isolated cluster is simulated. In periodic systems with filled boxes, the overall rotational motion is coupled to other degrees of freedom and does not cause such problems. 3.4.2 Neighbor searching As mentioned in chapter 4, internal forces are either generated from fixed (static) lists, or from dynamic lists. The latter consist of non-bonded interactions between any pair of particles. When calculating the non-bonded forces, it is convenient to have all particles in a rectangular box. As shown in Fig. 3.1, it is possible to transform a triclinic box into a rectangular box. The output coordinates are always in a rectangular box, even when a dodecahedron or triclinic box was used for the simulation. Equation 3.1 ensures that we can reset particles in a rectangular box by first shifting them with box vector c, then with b and finally with a. Equations 3.3, 3.4 and 3.5 ensure that we can find the 14 nearest triclinic images within a linear combination that does not involve multiples of box vectors. Pair lists generation The non-bonded pair forces need to be calculated only for those pairs i, j for which the distance rij between i and the nearest image of j is less than a given cut-off radius Rc . Some of the particle pairs that fulfill this criterion are excluded, when their interaction is already fully accounted for by bonded interactions. GROMACS employs a pair list that contains those particle pairs for which non-bonded forces must be calculated. The pair list contains particles i, a displacement vector for particle i, and all particles j that are within rlist of this particular image of particle i. The list is updated every nstlist steps. To make the neighbor list, all particles that are close (i.e. within the neighbor list cut-off) to a given particle must be found. This searching, usually called neighbor search (NS) or pair search, involves periodic boundary conditions and determining the image (see sec. 3.2). The search algorithm is O(N ), although a simpler O(N 2 ) algorithm is still available under some conditions. 3.4. Molecular Dynamics 19 Cut-off schemes: group versus Verlet From version 4.6, GROMACS supports two different cut-off scheme setups: the original one based on particle groups and one using a Verlet buffer. There are some important differences that affect results, performance and feature support. The group scheme can be made to work (almost) like the Verlet scheme, but this will lead to a decrease in performance. The group scheme is especially fast for water molecules, which are abundant in many simulations, but on the most recent x86 processors, this advantage is negated by the better instruction-level parallelism available in the Verlet-scheme implementation. The group scheme is deprecated in version 5.0, and will be removed in a future version. For practical details of choosing and setting up cut-off schemes, please see the User Guide. In the group scheme, a neighbor list is generated consisting of pairs of groups of at least one particle. These groups were originally charge groups (see sec. 3.4.2), but with a proper treatment of long-range electrostatics, performance in unbuffered simulations is their only advantage. A pair of groups is put into the neighbor list when their center of geometry is within the cut-off distance. Interactions between all particle pairs (one from each charge group) are calculated for a certain number of MD steps, until the neighbor list is updated. This setup is efficient, as the neighbor search only checks distance between charge-group pair, not particle pairs (saves a factor of 3×3 = 9 with a three-particle water model) and the non-bonded force kernels can be optimized for, say, a water molecule “group”. Without explicit buffering, this setup leads to energy drift as some particle pairs which are within the cut-off don’t interact and some outside the cut-off do interact. This can be caused by • particles moving across the cut-off between neighbor search steps, and/or • for charge groups consisting of more than one particle, particle pairs moving in/out of the cut-off when their charge group center of geometry distance is outside/inside of the cut-off. Explicitly adding a buffer to the neighbor list will remove such artifacts, but this comes at a high computational cost. How severe the artifacts are depends on the system, the properties in which you are interested, and the cut-off setup. The Verlet cut-off scheme uses a buffered pair list by default. It also uses clusters of particles, but these are not static as in the group scheme. Rather, the clusters are defined spatially and consist of 4 or 8 particles, which is convenient for stream computing, using e.g. SSE, AVX or CUDA on GPUs. At neighbor search steps, a pair list is created with a Verlet buffer, ie. the pair-list cut-off is larger than the interaction cut-off. In the non-bonded kernels, interactions are only computed when a particle pair is within the cut-off distance at that particular time step. This ensures that as particles move between pair search steps, forces between nearly all particles within the cut-off distance are calculated. We say nearly all particles, because GROMACS uses a fixed pair list update frequency for efficiency. A particle-pair, whose distance was outside the cut-off, could possibly move enough during this fixed number of steps that its distance is now within the cutoff. This small chance results in a small energy drift, and the size of the chance depends on the temperature. When temperature coupling is used, the buffer size can be determined automatically, given a certain tolerance on the energy drift. The Verlet cut-off scheme is implemented in a very efficient fashion based on clusters of particles. The simplest example is a cluster size of 4 particles. The pair list is then constructed based on 20 Chapter 3. Algorithms cluster pairs. The cluster-pair search is much faster searching based on particle pairs, because 4 × 4 = 16 particle pairs are put in the list at once. The non-bonded force calculation kernel can then calculate many particle-pair interactions at once, which maps nicely to SIMD or SIMT units on modern hardware, which can perform multiple floating operations at once. These non-bonded kernels are much faster than the kernels used in the group scheme for most types of systems, particularly on newer hardware. Additionally, when the list buffer is determined automatically as described below, we also apply dynamic pair list pruning. The pair list can be constructed infrequently, but that can lead to a lot of pairs in the list that are outside the cut-off range for all or most of the life time of this pair list. Such pairs can be pruned out by applying a cluster-pair kernel that only determines which clusters are in range. Because of the way the non-bonded data is regularized in GROMACS, this kernel is an order of magnitude faster than the search and the interaction kernel. On the GPU this pruning is overlapped with the integration on the CPU, so it is free in most cases. Therefore we can prune every 4-10 integration steps with little overhead and significantly reduce the number of cluster pairs in the interaction kernel. This procedure is applied automatically, unless the user set the pair-list buffer size manually. Energy drift and pair-list buffering For a canonical (NVT) ensemble, the average energy error caused by diffusion of j particles from outside the pair-list cut-off r` to inside the interaction cut-off rc over the lifetime of the list can be determined from the atomic displacements and the shape of the potential at the cut-off. The displacement distribution along one dimension for a freely moving particle with mass m over time t at temperature T is a Gaussian G(x) of zero mean and variance σ 2 = t2 kB T /m. For the distance 2 = t2 k T (1/m + 1/m ). Note that between two particles, the variance changes to σ 2 = σ12 1 2 B in practice particles usually interact with (bump into) other particles over time t and therefore the real displacement distribution is much narrower. Given a non-bonded interaction cut-off distance of rc and a pair-list cut-off r` = rc + rb for rb the Verlet buffer size, we can then write the average energy error after time t for all missing pair interactions between a single i particle of type 1 surrounded by all j particles that are of type 2 with number density ρ2 , when the inter-particle distance changes from r0 to rt , as: h∆V i = Z rc Z ∞ 0 r` 4πr02 ρ2 V rt − r0 (rt )G dr0 drt σ (3.8) To evaluate this analytically, we need to make some approximations. First we replace V (rt ) by a Taylor expansion around rc , then we can move the lower bound of the integral over r0 to −∞ which will simplify the result: h∆V i ≈ Z rc Z ∞ −∞ r` h 4πr02 ρ2 V 0 (rc )(rt − rc ) + 1 V 00 (rc ) (rt − rc )2 + 2 1 V 000 (rc ) (rt − rc )3 + 6 i r − r t 0 O (rt − rc )4 G dr0 drt σ (3.9) 3.4. Molecular Dynamics 21 Replacing the factor r02 by (r` + σ)2 , which results in a slight overestimate, allows us to calculate the integrals analytically: 2 h∆V i ≈ 4π(r` + σ) ρ2 Z rc Z ∞ h −∞ r` V 0 (rc )(rt − rc ) + 1 V 00 (rc ) (rt − rc )2 + 2 i r − r 1 t 0 000 3 dr0 drt (3.10) V (rc ) (rt − rc ) G 6 σ 1 rb rb − (rb2 + σ 2 )E + = 4π(r` + σ)2 ρ2 V 0 (rc ) rb σG 2 σ σ 1 00 rb rb − rb (rb2 + 3σ 2 )E + V (rc ) σ(rb2 + 2σ 2 )G 6 σ σ 1 000 rb V (rc ) rb σ(rb2 + 5σ 2 )G 24 σ rb (3.11) − (rb4 + 6rb2 σ 2 + 3σ 4 )E σ √ where G(x) is a Gaussian distribution with 0 mean and unit variance and E(x) = 21 erfc(x/ 2). We always want to achieve small energy error, so σ will be small compared to both rc and r` , thus the approximations in the equations above are good, since the Gaussian distribution decays rapidly. The energy error needs to be averaged over all particle pair types and weighted with the particle counts. In GROMACS we don’t allow cancellation of error between pair types, so we average the absolute values. To obtain the average energy error per unit time, it needs to be divided by the neighbor-list life time t = (nstlist − 1) × dt. The function can not be inverted analytically, so we use bisection to obtain the buffer size rb for a target drift. Again we note that in practice the error we usually be much smaller than this estimate, as in the condensed phase particle displacements will be much smaller than for freely moving particles, which is the assumption used here. When (bond) constraints are present, some particles will have fewer degrees of freedom. This will reduce the energy errors. For simplicity, we only consider one constraint per particle, the heaviest particle in case a particle is involved in multiple constraints. This simplification overestimates the displacement. The motion of a constrained particle is a superposition of the 3D motion of the center of mass of both particles and a 2D rotation around the center of mass. The displacement in an arbitrary direction of a particle with 2 degrees of freedom is not Gaussian, but rather follows the complementary error function: √ π |r| √ erfc √ (3.12) 2 2σ 2σ where σ 2 is again t2 kB T /m. This distribution can no longer be integrated analytically to obtain the energy error. But we can generate a tight upper bound using a scaled and shifted Gaussian distribution (not shown). This Gaussian distribution can then be used to calculate the energy error as described above. The rotation displacement around the center of mass can not be more than the length of the arm. To take this into account, we scale σ in eqn. 3.12 (details not presented here) to obtain an overestimate of the real displacement. This latter effect significantly reduces the buffer 22 Chapter 3. Algorithms 10 −2 drift per atom (kJ/mol/ps) estimate 4x4 10 −3 10 −4 10 −5 10 −6 estimate 1x1 mixed precision double precision 0 0.02 0.04 0.06 Verlet buffer (nm) 0.08 0.1 Figure 3.5: Energy drift per atom for an SPC/E water system at 300K with a time step of 2 fs and a pair-list update period of 10 steps (pair-list life time: 18 fs). PME was used with ewald-rtol set to 10−5 ; this parameter affects the shape of the potential at the cut-off. Error estimates due to finite Verlet buffer size are shown for a 1 × 1 atom pair list and 4 × 4 atom pair list without and with (dashed line) cancellation of positive and negative errors. Real energy drift is shown for simulations using double- and mixed-precision settings. Rounding errors in the SETTLE constraint algorithm from the use of single precision causes the drift to become negative at large buffer size. Note that at zero buffer size, the real drift is small because positive (H-H) and negative (O-H) energy errors cancel. size for longer neighborlist lifetimes in e.g. water, as constrained hydrogens are by far the fastest particles, but they can not move further than 0.1 nm from the heavy atom they are connected to. There is one important implementation detail that reduces the energy errors caused by the finite Verlet buffer list size. The derivation above assumes a particle pair-list. However, the GROMACS implementation uses a cluster pair-list for efficiency. The pair list consists of pairs of clusters of 4 particles in most cases, also called a 4 × 4 list, but the list can also be 4 × 8 (GPU CUDA kernels and AVX 256-bit single precision kernels) or 4 × 2 (SSE double-precision kernels). This means that the pair-list is effectively much larger than the corresponding 1 × 1 list. Thus slightly beyond the pair-list cut-off there will still be a large fraction of particle pairs present in the list. This fraction can be determined in a simulation and accurately estimated under some reasonable assumptions. The fraction decreases with increasing pair-list range, meaning that a smaller buffer can be used. For typical all-atom simulations with a cut-off of 0.9 nm this fraction is around 0.9, which gives a reduction in the energy errors of a factor of 10. This reduction is taken into account during the automatic Verlet buffer calculation and results in a smaller buffer size. In Fig. 3.5 one can see that for small buffer sizes the drift of the total energy is much smaller than the pair energy error tolerance, due to cancellation of errors. For larger buffer size, the error estimate is a factor of 6 higher than drift of the total energy, or alternatively the buffer estimate is 0.024 nm too large. This is because the protons don’t move freely over 18 fs, but rather vibrate. 3.4. Molecular Dynamics 23 i’ 111111111 000000000 000000000 111111111 000000000 111111111 j 000000 111111 000 111 0000 1111 000000 111111 0001111 111 0000 00000 11111 00000000000 11111111111 00000 11111 00000000000 11111111111 00000 11111 00000000000 11111111111 i 00000 11111 00000000000 11111111111 00000 11111 00000 11111 00000 11111 00000 11111 1111111111111111 0000000000000000 Figure 3.6: Grid search in two dimensions. The arrows are the box vectors. Cut-off artifacts and switched interactions With the Verlet scheme, the pair potentials are shifted to be zero at the cut-off, which makes the potential the integral of the force. This is only possible in the group scheme if the shape of the potential is such that its value is zero at the cut-off distance. However, there can still be energy drift when the forces are non-zero at the cut-off. This effect is extremely small and often not noticeable, as other integration errors (e.g. from constraints) may dominate. To completely avoid cut-off artifacts, the non-bonded forces can be switched exactly to zero at some distance smaller than the neighbor list cut-off (there are several ways to do this in GROMACS, see sec. 4.1.5). One then has a buffer with the size equal to the neighbor list cut-off less the longest interaction cut-off. Simple search Due to eqns. 3.1 and 3.6, the vector r ij connecting images within the cut-off Rc can be found by constructing: r 000 = r j − r i 00 = 0 = r ij = r r 000 r − c ∗ round(rz000 /cz ) r 00 − b ∗ round(ry00 /by ) r 0 − a ∗ round(rx0 /ax ) (3.13) (3.14) (3.15) (3.16) When distances between two particles in a triclinic box are needed that do not obey eqn. 3.1, many shifts of combinations of box vectors need to be considered to find the nearest image. Grid search The grid search is schematically depicted in Fig. 3.6. All particles are put on the NS grid, with the smallest spacing ≥ Rc /2 in each of the directions. In the direction of each box vector, a particle 24 Chapter 3. Algorithms i has three images. For each direction the image may be -1,0 or 1, corresponding to a translation over -1, 0 or +1 box vector. We do not search the surrounding NS grid cells for neighbors of i and then calculate the image, but rather construct the images first and then search neighbors corresponding to that image of i. As Fig. 3.6 shows, some grid cells may be searched more than once for different images of i. This is not a problem, since, due to the minimum image convention, at most one image will “see” the j-particle. For every particle, fewer than 125 (53 ) neighboring cells are searched. Therefore, the algorithm scales linearly with the number of particles. Although the prefactor is large, the scaling behavior makes the algorithm far superior over the standard O(N 2 ) algorithm when there are more than a few hundred particles. The grid search is equally fast for rectangular and triclinic boxes. Thus for most protein and peptide simulations the rhombic dodecahedron will be the preferred box shape. Charge groups Charge groups were originally introduced to reduce cut-off artifacts of Coulomb interactions. When a plain cut-off is used, significant jumps in the potential and forces arise when atoms with (partial) charges move in and out of the cut-off radius. When all chemical moieties have a net charge of zero, these jumps can be reduced by moving groups of atoms with net charge zero, called charge groups, in and out of the neighbor list. This reduces the cut-off effects from the charge-charge level to the dipole-dipole level, which decay much faster. With the advent of full range electrostatics methods, such as particle-mesh Ewald (sec. 4.8.2), the use of charge groups is no longer required for accuracy. It might even have a slight negative effect on the accuracy or efficiency, depending on how the neighbor list is made and the interactions are calculated. But there is still an important reason for using “charge groups”: efficiency with the group cut-off scheme. Where applicable, neighbor searching is carried out on the basis of charge groups which are defined in the molecular topology. If the nearest image distance between the geometrical centers of the atoms of two charge groups is less than the cut-off radius, all atom pairs between the charge groups are included in the pair list. The neighbor searching for a water system, for instance, is 32 = 9 times faster when each molecule is treated as a charge group. Also the highly optimized water force loops (see sec. A.2.1) only work when all atoms in a water molecule form a single charge group. Currently the name neighbor-search group would be more appropriate, but the name charge group is retained for historical reasons. When developing a new force field, the advice is to use charge groups of 3 to 4 atoms for optimal performance. For all-atom force fields this is relatively easy, as one can simply put hydrogen atoms, and in some case oxygen atoms, in the same charge group as the heavy atom they are connected to; for example: CH3 , CH2 , CH, NH2 , NH, OH, CO2 , CO. With the Verlet cut-off scheme, charge groups are ignored. 3.4.3 Compute forces Potential energy When forces are computed, the potential energy of each interaction term is computed as well. The total potential energy is summed for various contributions, such as Lennard-Jones, Coulomb, and 3.4. Molecular Dynamics 25 bonded terms. It is also possible to compute these contributions for energy-monitor groups of atoms that are separately defined (see sec. 3.3). Kinetic energy and temperature The temperature is given by the total kinetic energy of the N -particle system: Ekin = N 1X mi vi2 2 i=1 (3.17) From this the absolute temperature T can be computed using: 1 Ndf kT = Ekin (3.18) 2 where k is Boltzmann’s constant and Ndf is the number of degrees of freedom which can be computed from: Ndf = 3N − Nc − Ncom (3.19) Here Nc is the number of constraints imposed on the system. When performing molecular dynamics Ncom = 3 additional degrees of freedom must be removed, because the three center-of-mass velocities are constants of the motion, which are usually set to zero. When simulating in vacuo, the rotation around the center of mass can also be removed, in this case Ncom = 6. When more than one temperature-coupling group is used, the number of degrees of freedom for group i is: i Ndf = (3N i − Nci ) 3N − Nc − Ncom 3N − Nc (3.20) The kinetic energy can also be written as a tensor, which is necessary for pressure calculation in a triclinic system, or systems where shear forces are imposed: Ekin N 1X mi v i ⊗ v i = 2 i (3.21) Pressure and virial The pressure tensor P is calculated from the difference between kinetic energy Ekin and the virial Ξ: 2 P = (Ekin − Ξ) (3.22) V where V is the volume of the computational box. The scalar pressure P , which can be used for pressure coupling in the case of isotropic systems, is computed as: P = trace(P)/3 (3.23) The virial Ξ tensor is defined as: Ξ=− 1X r ij ⊗ F ij 2 irc , mdrun employs a smart algorithm to reduce the communication. Simply communicating all charge groups within rmb would increase the amount of communication enormously. Therefore only charge-groups that are connected by bonded interactions to charge groups which are not locally present are communicated. This leads to little extra communication, but also to a slightly increased cost for the domain decomposition setup. In some cases, e.g. coarse-grained simulations with a very short cut-off, one might want to set rmb by hand to reduce this cost. 3.17.5 Multiple-Program, Multiple-Data PME parallelization Electrostatics interactions are long-range, therefore special algorithms are used to avoid summation over many atom pairs. In GROMACS this is usually PME (sec. 4.8.2). Since with PME all 62 Chapter 3. Algorithms 8 PP/PME ranks 6 PP ranks 2 PME ranks Figure 3.14: Example of 8 ranks without (left) and with (right) MPMD. The PME communication (red arrows) is much higher on the left than on the right. For MPMD additional PP - PME coordinate and force communication (blue arrows) is required, but the total communication complexity is lower. particles interact with each other, global communication is required. This will usually be the limiting factor for scaling with domain decomposition. To reduce the effect of this problem, we have come up with a Multiple-Program, Multiple-Data approach [5]. Here, some ranks are selected to do only the PME mesh calculation, while the other ranks, called particle-particle (PP) ranks, do all the rest of the work. For rectangular boxes the optimal PP to PME rank ratio is usually 3:1, for rhombic dodecahedra usually 2:1. When the number of PME ranks is reduced by a factor of 4, the number of communication calls is reduced by about a factor of 16. Or put differently, we can now scale to 4 times more ranks. In addition, for modern 4 or 8 core machines in a network, the effective network bandwidth for PME is quadrupled, since only a quarter of the cores will be using the network connection on each machine during the PME calculations. mdrun will by default interleave the PP and PME ranks. If the ranks are not number consecutively inside the machines, one might want to use mdrun -ddorder pp_pme. For machines with a real 3-D torus and proper communication software that assigns the ranks accordingly one should use mdrun -ddorder cartesian. To optimize the performance one should usually set up the cut-offs and the PME grid such that the PME load is 25 to 33% of the total calculation load. grompp will print an estimate for this load at the end and also mdrun calculates the same estimate to determine the optimal number of PME ranks to use. For high parallelization it might be worthwhile to optimize the PME load with the mdp settings and/or the number of PME ranks with the -npme option of mdrun. For changing the electrostatics settings it is useful to know the accuracy of the electrostatics remains nearly constant when the Coulomb cut-off and the PME grid spacing are scaled by the same factor. Note that it is usually better to overestimate than to underestimate the number of PME ranks, since the number of PME ranks is smaller than the number of PP ranks, which leads to less total waiting time. The PME domain decomposition can be 1-D or 2-D along the x and/or y axis. 2-D decomposition is also known as pencil decomposition because of the shape of the domains at high parallelization. 1-D decomposition along the y axis can only be used when the PP decomposition has only 1 domain along x. 2-D PME decomposition has to have the number of domains along x equal to the number of the PP decomposition. mdrun automatically chooses 1-D or 2-D PME decomposition (when possible with the total given number of ranks), based on the minimum amount of commu- 3.18. Implicit solvation 63 nication for the coordinate redistribution in PME plus the communication for the grid overlap and transposes. To avoid superfluous communication of coordinates and forces between the PP and PME ranks, the number of DD cells in the x direction should ideally be the same or a multiple of the number of PME ranks. By default, mdrun takes care of this issue. 3.17.6 Domain decomposition flow chart In Fig. 3.15 a flow chart is shown for domain decomposition with all possible communication for different algorithms. For simpler simulations, the same flow chart applies, without the algorithms and communication for the algorithms that are not used. 3.18 Implicit solvation Implicit solvent models provide an efficient way of representing the electrostatic effects of solvent molecules, while saving a large piece of the computations involved in an accurate, aqueous description of the surrounding water in molecular dynamics simulations. Implicit solvation models offer several advantages compared with explicit solvation, including eliminating the need for the equilibration of water around the solute, and the absence of viscosity, which allows the protein to more quickly explore conformational space. Implicit solvent calculations in GROMACS can be done using the generalized Born-formalism, and the Still [71], HCT [72], and OBC [73] models are available for calculating the Born radii. Here, the free energy Gsolv of solvation is the sum of three terms, a solvent-solvent cavity term (Gcav ), a solute-solvent van der Waals term (Gvdw ), and finally a solvent-solute electrostatics polarization term (Gpol ). The sum of Gcav and Gvdw corresponds to the (non-polar) free energy of solvation for a molecule from which all charges have been removed, and is commonly called Gnp , calculated from the total solvent accessible surface area multiplied with a surface tension. The total expression for the solvation free energy then becomes: Gsolv = Gnp + Gpol (3.150) Under the generalized Born model, Gpol is calculated from the generalized Born equation [71]: n X n 1 X s = 1− i=1 j>i qi qj Gpol 2 rij (3.151) + bi bj exp 2 −rij 4bi bj In GROMACS, we have introduced the substitution [74]: 1 ci = √ bi (3.152) which makes it possible to introduce a cheap transformation to a new variable x when evaluating each interaction, such that: 64 Chapter 3. Algorithms Figure 3.15: Flow chart showing the algorithms and communication (arrows) for a standard MD simulation with virtual sites, constraints and separate PME-mesh ranks. 3.18. Implicit solvation 65 rij x= p = rij ci cj bi bj (3.153) In the end, the full re-formulation of 3.151 becomes: Gpol = 1 − n X n n n X 1 X qi qj 1 X p ξ(x) = 1 − qi ci qj cj ξ(x) i=1 j>i bi bj i=1 j>i (3.154) The non-polar part (Gnp ) of Equation 3.150 is calculated directly from the Born radius of each atom using a simple ACE type approximation by Schaefer et al. [75], including a simple loop over all atoms. This requires only one extra solvation parameter, independent of atom type, but differing slightly between the three Born radii models. 66 Chapter 3. Algorithms Chapter 4 Interaction function and force fields To accommodate the potential functions used in some popular force fields (see 4.10), GROMACS offers a choice of functions, both for non-bonded interaction and for dihedral interactions. They are described in the appropriate subsections. The potential functions can be subdivided into three parts 1. Non-bonded: Lennard-Jones or Buckingham, and Coulomb or modified Coulomb. The nonbonded interactions are computed on the basis of a neighbor list (a list of non-bonded atoms within a certain radius), in which exclusions are already removed. 2. Bonded: covalent bond-stretching, angle-bending, improper dihedrals, and proper dihedrals. These are computed on the basis of fixed lists. 3. Restraints: position restraints, angle restraints, distance restraints, orientation restraints and dihedral restraints, all based on fixed lists. 4. Applied Forces: externally applied forces, see chapter 6. 4.1 Non-bonded interactions Non-bonded interactions in GROMACS are pair-additive: V (r 1 , . . . r N ) = X Vij (r ij ); (4.1) i ∆φ 0 for φ0 ≤ ∆φ (4.84) where ∆φ is a user defined angle and kdihr is the force constant. Note that in the input in topology files, angles are given in degrees and force constants in kJ/mol/rad2 . 4.3. Restraints 89 15 r1 r0 r2 −1 Vdisre (kJ mol ) 10 5 0 0 0.1 0.2 0.3 0.4 0.5 r (nm) Figure 4.15: Distance Restraint potential. 4.3.5 Distance restraints Distance restraints add a penalty to the potential when the distance between specified pairs of atoms exceeds a threshold value. They are normally used to impose experimental restraints from, for instance, experiments in nuclear magnetic resonance (NMR), on the motion of the system. Thus, MD can be used for structure refinement using NMR data. In GROMACS there are three ways to impose restraints on pairs of atoms: • Simple harmonic restraints: use [ bonds ] type 6 (see sec. 5.4.4). • Piecewise linear/harmonic restraints: [ bonds ] type 10. • Complex NMR distance restraints, optionally with pair, time and/or ensemble averaging. The last two options will be detailed now. The potential form for distance restraints is quadratic below a specified lower bound and between two specified upper bounds, and linear beyond the largest bound (see Fig. 4.15). Vdr (rij ) = 1 2 2 kdr (rij − r0 ) 0 for 1 2 2 kdr (rij − r1 ) 1 2 kdr (r2 The forces are Fi = rij < r0 for r0 ≤ rij < r1 for r1 ≤ rij < r2 (4.85) − r1 )(2rij − r2 − r1 ) for r2 ≤ rij r −kdr (rij − r0 ) rijij 0 r −kdr (rij − r1 ) rijij −k (r − r ) r ij 1 rij dr 2 rij < r0 for r0 ≤ rij < r1 for r1 ≤ rij < r2 for for r2 ≤ rij (4.86) 90 Chapter 4. Interaction function and force fields For restraints not derived from NMR data, this functionality will usually suffice and a section of [ bonds ] type 10 can be used to apply individual restraints between pairs of atoms, see 5.8.1. For applying restraints derived from NMR measurements, more complex functionality might be required, which is provided through the [ distance_restraints ] section and is described below. Time averaging Distance restraints based on instantaneous distances can potentially reduce the fluctuations in a molecule significantly. This problem can be overcome by restraining to a time averaged distance [96]. The forces with time averaging are: r ij a r̄ij < r0 −kdr (r̄ij − r0 ) rij for Fi = 0 a (r̄ − r ) r ij −kdr ij 1 rij −k a (r − r ) r ij 1 rij dr 2 for r0 ≤ r̄ij < r1 for r1 ≤ r̄ij < r2 (4.87) for r2 ≤ r̄ij where r̄ij is given by an exponential running average with decay time τ : −3 >−1/3 r̄ij = < rij (4.88) a is switched on slowly to compensate for the lack of history at the beginning The force constant kdr of the simulation: t a kdr = kdr 1 − exp − (4.89) τ Because of the time averaging, we can no longer speak of a distance restraint potential. This way an atom can satisfy two incompatible distance restraints on average by moving between two positions. An example would be an amino acid side-chain that is rotating around its χ dihedral angle, thereby coming close to various other groups. Such a mobile side chain can give rise to multiple NOEs that can not be fulfilled by a single structure. The computation of the time averaged distance in the mdrun program is done in the following fashion: r−3 ij (0) = rij (0)−3 h i −3 1 − exp − ∆t r−3 ij (t) = r−3 ij (t − ∆t) exp − ∆t + r (t) ij τ τ (4.90) When a pair is within the bounds, it can still feel a force because the time averaged distance can still be beyond a bound. To prevent the protons from being pulled too close together, a mixed approach can be used. In this approach, the penalty is zero when the instantaneous distance is within the bounds, otherwise the violation is the square root of the product of the instantaneous violation and the time averaged violation: q r ij a for rij < r0 and r̄ij < r0 k dr (rij − r0 )(r̄ij − r0 ) rij Fi = a min −kdr 0 q (rij − r1 )(r̄ij − r1 ), r2 − r1 r ij rij for rij > r1 and r̄ij > r1 otherwise (4.91) 4.3. Restraints 91 Averaging over multiple pairs Sometimes it is unclear from experimental data which atom pair gives rise to a single NOE, in other occasions it can be obvious that more than one pair contributes due to the symmetry of the system, e.g. a methyl group with three protons. For such a group, it is not possible to distinguish between the protons, therefore they should all be taken into account when calculating the distance between this methyl group and another proton (or group of protons). Due to the physical nature of magnetic resonance, the intensity of the NOE signal is inversely proportional to the sixth power of the inter-atomic distance. Thus, when combining atom pairs, a fixed list of N restraints may be taken together, where the apparent “distance” is given by: rN (t) = " N X #−1/6 r̄n (t) −6 (4.92) n=1 where we use rij or eqn. 4.88 for the r̄n . The rN of the instantaneous and time-averaged distances can be combined to do a mixed restraining, as indicated above. As more pairs of protons contribute to the same NOE signal, the intensity will increase, and the summed “distance” will be shorter than any of its components due to the reciprocal summation. There are two options for distributing the forces over the atom pairs. In the conservative option, the force is defined as the derivative of the restraint potential with respect to the coordinates. This results in a conservative potential when time averaging is not used. The force distribution over the pairs is proportional to r−6 . This means that a close pair feels a much larger force than a distant pair, which might lead to a molecule that is “too rigid.” The other option is an equal force distribution. In this case each pair feels 1/N of the derivative of the restraint potential with respect to rN . The advantage of this method is that more conformations might be sampled, but the non-conservative nature of the forces can lead to local heating of the protons. It is also possible to use ensemble averaging using multiple (protein) molecules. In this case the bounds should be lowered as in: r1 r2 = r1 ∗ M −1/6 = r2 ∗ M −1/6 (4.93) where M is the number of molecules. The GROMACS preprocessor grompp can do this automatically when the appropriate option is given. The resulting “distance” is then used to calculate the scalar force according to: Fi = 0 r kdr (rN − r1 ) rijij r kdr (r2 − r1 ) rijij rN < r1 r1 ≤ rN < r2 (4.94) rN ≥ r2 where i and j denote the atoms of all the pairs that contribute to the NOE signal. Using distance restraints A list of distance restrains based on NOE data can be added to a molecule definition in your topology file, like in the following example: 92 Chapter 4. Interaction function and force fields [ distance_restraints ] ; ai aj type index 10 16 1 0 10 28 1 1 10 46 1 1 16 22 1 2 16 34 1 3 type’ 1 1 1 1 1 low 0.0 0.0 0.0 0.0 0.0 up1 0.3 0.3 0.3 0.3 0.5 up2 0.4 0.4 0.4 0.4 0.6 fac 1.0 1.0 1.0 2.5 1.0 In this example a number of features can be found. In columns ai and aj you find the atom numbers of the particles to be restrained. The type column should always be 1. As explained in 4.3.5, multiple distances can contribute to a single NOE signal. In the topology this can be set using the index column. In our example, the restraints 10-28 and 10-46 both have index 1, therefore they are treated simultaneously. An extra requirement for treating restraints together is that the restraints must be on successive lines, without any other intervening restraint. The type’ column will usually be 1, but can be set to 2 to obtain a distance restraint that will never be timeand ensemble-averaged; this can be useful for restraining hydrogen bonds. The columns low, up1, and up2 hold the values of r0 , r1 , and r2 from eqn. 4.85. In some cases it can be useful to have different force constants for some restraints; this is controlled by the column fac. The force constant in the parameter file is multiplied by the value in the column fac for each restraint. Information for each restraint is stored in the energy file and can be processed and plotted with gmx nmr. 4.3.6 Orientation restraints This section describes how orientations between vectors, as measured in certain NMR experiments, can be calculated and restrained in MD simulations. The presented refinement methodology and a comparison of results with and without time and ensemble averaging have been published [97]. Theory In an NMR experiment, orientations of vectors can be measured when a molecule does not tumble completely isotropically in the solvent. Two examples of such orientation measurements are residual dipolar couplings (between two nuclei) or chemical shift anisotropies. An observable for a vector r i can be written as follows: 2 δi = tr(SDi ) 3 (4.95) where S is the dimensionless order tensor of the molecule. The tensor Di is given by: 3xx − 1 3xy 3xz ci 3yy − 1 3yz Di = 3xy α kr i k 3xz 3yz 3zz − 1 with: x = ri,x , kr i k y= ri,y , kr i k z= ri,z kr i k (4.96) (4.97) 4.3. Restraints 93 For a dipolar coupling r i is the vector connecting the two nuclei, α = 3 and the constant ci is given by: µ0 i i h̄ ci = γ γ (4.98) 4π 1 2 4π where γ1i and γ2i are the gyromagnetic ratios of the two nuclei. The order tensor is symmetric and has trace zero. Using a rotation matrix T it can be transformed into the following form: − 1 (1 − η) 0 0 2 T 1 0 − 2 (1 + η) 0 T ST = s 0 0 1 (4.99) where −1 ≤ s ≤ 1 and 0 ≤ η ≤ 1. s is called the order parameter and η the asymmetry of the order tensor S. When the molecule tumbles isotropically in the solvent, s is zero, and no orientational effects can be observed because all δi are zero. Calculating orientations in a simulation For reasons which are explained below, the D matrices are calculated which respect to a reference orientation of the molecule. The orientation is defined by a rotation matrix R, which is needed to least-squares fit the current coordinates of a selected set of atoms onto a reference conformation. The reference conformation is the starting conformation of the simulation. In case of ensemble averaging, which will be treated later, the structure is taken from the first subsystem. The calculated Dci matrix is given by: Dci (t) = R(t)Di (t)RT (t) (4.100) The calculated orientation for vector i is given by: 2 δic (t) = tr(S(t)Dci (t)) 3 (4.101) The order tensor S(t) is usually unknown. A reasonable choice for the order tensor is the tensor which minimizes the (weighted) mean square difference between the calculated and the observed orientations: ! M SD(t) = N X i=1 wi −1 N X wi (δic (t) − δiexp )2 (4.102) i=1 To properly combine different types of measurements, the unit of wi should be such that all terms are dimensionless. This means the unit of wi is the unit of δi to the power −2. Note that scaling all wi with a constant factor does not influence the order tensor. Time averaging Since the tensors Di fluctuate rapidly in time, much faster than can be observed in an experiment, they should be averaged over time in the simulation. However, in a simulation the time and the number of copies of a molecule are limited. Usually one can not obtain a converged average of the Di tensors over all orientations of the molecule. If one assumes that the average orientations of 94 Chapter 4. Interaction function and force fields the r i vectors within the molecule converge much faster than the tumbling time of the molecule, the tensor can be averaged in an axis system that rotates with the molecule, as expressed by equation (4.100). The time-averaged tensors are calculated using an exponentially decaying memory function: Z t t−u c Di (u) exp − du τ Dai (t) = u=tZ0 t (4.103) t−u exp − du τ u=t0 Assuming that the order tensor S fluctuates slower than the Di , the time-averaged orientation can be calculated as: 2 δia (t) = tr(S(t)Dai (t)) (4.104) 3 where the order tensor S(t) is calculated using expression (4.102) with δic (t) replaced by δia (t). Restraining The simulated structure can be restrained by applying a force proportional to the difference between the calculated and the experimental orientations. When no time averaging is applied, a proper potential can be defined as: N 1 X wi (δic (t) − δiexp )2 V = k 2 i=1 (4.105) where the unit of k is the unit of energy. Thus the effective force constant for restraint i is kwi . The forces are given by minus the gradient of V . The force Fi working on vector r i is: Fi (t) = − dV dr i dδi (t) dr i 2ci 2+α T T T = −kwi (δic (t) − δiexp ) 2R SRr − tr(R SRr r )r i i i i krk2+α krk2 = −kwi (δic (t) − δiexp ) Ensemble averaging Ensemble averaging can be applied by simulating a system of M subsystems that each contain an identical set of orientation restraints. The systems only interact via the orientation restraint potential which is defined as: N 1 X V =M k wi hδic (t) − δiexp i2 2 i=1 (4.106) The force on vector r i,m in subsystem m is given by: Fi,m (t) = − c (t) dδi,m dV = −kwi hδic (t) − δiexp i dr i,m dr i,m (4.107) 4.3. Restraints 95 Time averaging When using time averaging it is not possible to define a potential. We can still define a quantity that gives a rough idea of the energy stored in the restraints: N 1 X V = M ka wi hδia (t) − δiexp i2 2 i=1 (4.108) The force constant ka is switched on slowly to compensate for the lack of history at times close to t0 . It is exactly proportional to the amount of average that has been accumulated: 1 k =k τ a Z t t−u exp − du τ u=t0 (4.109) What really matters is the definition of the force. It is chosen to be proportional to the square root of the product of the time-averaged and the instantaneous deviation. Using only the time-averaged deviation induces large oscillations. The force is given by: Fi,m (t) = 0 for a b ≤ 0 c (t) a √ dδi,m a ab k wi |a| dr i,m for a b > 0 (4.110) a = hδia (t) − δiexp i b = hδic (t) − δiexp i Using orientation restraints Orientation restraints can be added to a molecule definition in the topology file in the section [ orientation_restraints ]. Here we give an example section containing five N-H residual dipolar coupling restraints: [ orientation_restraints ] ; ai aj type exp. label ; 31 32 1 1 3 43 44 1 1 4 55 56 1 1 5 65 66 1 1 6 73 74 1 1 7 alpha Hz 3 3 3 3 3 const. nm^3 6.083 6.083 6.083 6.083 6.083 obs. Hz -6.73 -7.87 -7.13 -2.57 -2.10 weight Hz^-2 1.0 1.0 1.0 1.0 1.0 The unit of the observable is Hz, but one can choose any other unit. In columns ai and aj you find the atom numbers of the particles to be restrained. The type column should always be 1. The exp. column denotes the experiment number, starting at 1. For each experiment a separate order tensor S is optimized. The label should be a unique number larger than zero for each restraint. The alpha column contains the power α that is used in equation (4.96) to calculate the orientation. The const. column contains the constant ci used in the same equation. The constant should 96 Chapter 4. Interaction function and force fields have the unit of the observable times nmα . The column obs. contains the observable, in any unit you like. The last column contains the weights wi ; the unit should be the inverse of the square of the unit of the observable. Some parameters for orientation restraints can be specified in the grompp.mdp file, for a study of the effect of different force constants and averaging times and ensemble averaging see [97]. Information for each restraint is stored in the energy file and can be processed and plotted with gmx nmr. 4.4 Polarization Polarization can be treated by GROMACS by attaching shell (Drude) particles to atoms and/or virtual sites. The energy of the shell particle is then minimized at each time step in order to remain on the Born-Oppenheimer surface. 4.4.1 Simple polarization This is implemented as a harmonic potential with equilibrium distance 0. The input given in the topology file is the polarizability α (in GROMACS units) as follows: [ polarization ] ; Atom i j type 1 2 1 alpha 0.001 in this case the polarizability volume is 0.001 nm3 (or 1 Å3 ). In order to compute the harmonic force constant kcs (where cs stands for core-shell), the following is used [45]: kcs = qs2 α (4.111) where qs is the charge on the shell particle. 4.4.2 Anharmonic polarization For the development of the Drude force field by Roux and McKerell [98] it was found that some particles can overpolarize and this was fixed by introducing a higher order term in the polarization energy: kcs 2 2 rcs Vpol = = kcs 2 2 rcs + khyp (rcs − δ)4 rcs ≤ δ (4.112) rcs > δ (4.113) where δ is a user-defined constant that is set to 0.02 nm for anions in the Drude force field [99]. Since this original introduction it has also been used in other atom types [98]. [ polarization ] ;Atom i j type 1 2 2 alpha (nm^3) 0.001786 delta 0.02 khyp 16.736e8 The above force constant khyp corresponds to 4·108 kcal/mol/nm4 , hence the strange number. 4.5. Free energy interactions 4.4.3 97 Water polarization A special potential for water that allows anisotropic polarization of a single shell particle [45]. 4.4.4 Thole polarization Based on early work by Thole [100], Roux and coworkers have implemented potentials for molecules like ethanol [101, 102, 103]. Within such molecules, there are intra-molecular interactions between shell particles, however these must be screened because full Coulomb would be too strong. The potential between two shell particles i and j is: qi qj r̄ij 1− 1+ rij 2 Vthole = exp−r̄ij (4.114) Note that there is a sign error in Equation 1 of Noskov et al. [103]: r̄ij = a rij (αi αj )1/6 (4.115) where a is a magic (dimensionless) constant, usually chosen to be 2.6 [103]; αi and αj are the polarizabilities of the respective shell particles. 4.5 Free energy interactions This section describes the λ-dependence of the potentials used for free energy calculations (see sec. 3.12). All common types of potentials and constraints can be interpolated smoothly from state A (λ = 0) to state B (λ = 1) and vice versa. All bonded interactions are interpolated by linear interpolation of the interaction parameters. Non-bonded interactions can be interpolated linearly or via soft-core interactions. Starting in GROMACS 4.6, λ is a vector, allowing different components of the free energy transformation to be carried out at different rates. Coulomb, Lennard-Jones, bonded, and restraint terms can all be controlled independently, as described in the .mdp options. Harmonic potentials The example given here is for the bond potential, which is harmonic in GROMACS. However, these equations apply to the angle potential and the improper dihedral potential as well. Vb = ∂Vb ∂λ = ih i2 1h B (1 − λ)kbA + λkbB b − (1 − λ)bA 0 − λb0 2 h i2 1 B B (kb − kbA ) b − (1 − λ)bA + 0 + λb0 2 h B A B (bA 0 − b0 ) b − (1 − λ)b0 − λb0 ih (1 − λ)kbA + λkbB (4.116) i (4.117) 98 Chapter 4. Interaction function and force fields GROMOS-96 bonds and angles Fourth-power bond stretching and cosine-based angle potentials are interpolated by linear interpolation of the force constant and the equilibrium position. Formulas are not given here. Proper dihedrals For the proper dihedrals, the equations are somewhat more complicated: Vd = ∂Vd ∂λ h (1 − λ)kdA + λkdB i h B 1 + cos nφ φ − (1 − λ)φA s − λφs h B = (kdB − kdA ) 1 + cos nφ φ − (1 − λ)φA s − λφs h i i i (4.118) + h A A B A B (φB s − φs ) (1 − λ)kd − λkd sin nφ φ − (1 − λ)φs − λφs i (4.119) Note: that the multiplicity nφ can not be parameterized because the function should remain periodic on the interval [0, 2π]. Tabulated bonded interactions For tabulated bonded interactions only the force constant can interpolated: V ∂V ∂λ = ((1 − λ)k A + λk B ) f (4.120) = (k B − k A ) f (4.121) Coulomb interaction The Coulomb interaction between two particles of which the charge varies with λ is: Vc = ∂Vc ∂λ where f = 1 4πε0 = f εrf rij f εrf rij h (1 − λ)qiA qjA + λ qiB qjB h −qiA qjA + qiB qjB i i (4.122) (4.123) = 138.935 458 (see chapter 2). Coulomb interaction with reaction field The Coulomb interaction including a reaction field, between two particles of which the charge varies with λ is: " Vc ∂Vc ∂λ # h i 1 2 = f + krf rij − crf (1 − λ)qiA qjA + λ qiB qjB rij " (4.124) # h i 1 2 = f + krf rij − crf −qiA qjA + qiB qjB rij (4.125) Note that the constants krf and crf are defined using the dielectric constant εrf of the medium (see sec. 4.1.4). 4.5. Free energy interactions 99 Lennard-Jones interaction For the Lennard-Jones interaction between two particles of which the atom type varies with λ we can write: VLJ = A + λ CB (1 − λ)C6A + λ C6B (1 − λ)C12 12 − 12 6 rij rij (4.126) ∂VLJ ∂λ = B − CA C6B − C6A C12 12 − 12 6 rij rij (4.127) It should be noted that it is also possible to express a pathway from state A to state B using σ and (see eqn. 4.5). It may seem to make sense physically to vary the force field parameters σ and rather than the derived parameters C12 and C6 . However, the difference between the pathways in parameter space is not large, and the free energy itself does not depend on the pathway, so we use the simple formulation presented above. Kinetic Energy When the mass of a particle changes, there is also a contribution of the kinetic energy to the free energy (note that we can not write the momentum p as mv, since that would result in the sign of ∂Ek ∂λ being incorrect [104]): 1 p2 2 (1 − λ)mA + λmB 1 p2 (mB − mA ) = − 2 ((1 − λ)mA + λmB )2 Ek = ∂Ek ∂λ (4.128) (4.129) after taking the derivative, we can insert p = mv, such that: ∂Ek 1 = − v 2 (mB − mA ) ∂λ 2 (4.130) Constraints The constraints are formally part of the Hamiltonian, and therefore they give a contribution to the free energy. In GROMACS this can be calculated using the LINCS or the SHAKE algorithm. If we have k = 1 . . . K constraint equations gk for LINCS, then gk = |r k | − dk (4.131) where r k is the displacement vector between two particles and dk is the constraint distance between the two particles. We can express the fact that the constraint distance has a λ dependency by B dk = (1 − λ)dA (4.132) k + λdk Thus the λ-dependent constraint equation is B gk = |r k | − (1 − λ)dA k + λdk . (4.133) 100 Chapter 4. Interaction function and force fields 5 LJ, α=0 LJ, α=1.5 LJ, α=2 3/r, α=0 3/r, α=1.5 3/r, α=2 4 Vsc 3 2 1 0 −1 0 0.5 1 1.5 r 2 2.5 3 A = C B = C B = 1. Figure 4.16: Soft-core interactions at λ = 0.5, with p = 2 and C6A = C12 6 12 The (zero) contribution G to the Hamiltonian from the constraints (using Lagrange multipliers λk , which are logically distinct from the free-energy λ) is G = K X λk gk (4.134) ∂G ∂dk ∂dk ∂λ (4.135) k ∂G ∂λ = = − K X A λk dB k − dk (4.136) k For SHAKE, the constraint equations are gk = r 2k − d2k (4.137) with dk as before, so ∂G ∂λ 4.5.1 = −2 K X A λk dB k − dk (4.138) k Soft-core interactions In a free-energy calculation where particles grow out of nothing, or particles disappear, using the the simple linear interpolation of the Lennard-Jones and Coulomb potentials as described in Equations 4.127 and 4.125 may lead to poor convergence. When the particles have nearly disappeared, or are close to appearing (at λ close to 0 or 1), the interaction energy will be weak enough for particles to get very close to each other, leading to large fluctuations in the measured values of ∂V /∂λ (which, because of the simple linear interpolation, depends on the potentials at both the endpoints of λ). 4.5. Free energy interactions 101 To circumvent these problems, the singularities in the potentials need to be removed. This can be done by modifying the regular Lennard-Jones and Coulomb potentials with “soft-core” potentials that limit the energies and forces involved at λ values between 0 and 1, but not at λ = 0 or 1. In GROMACS the soft-core potentials Vsc are shifted versions of the regular potentials, so that the singularity in the potential and its derivatives at r = 0 is never reached: Vsc (r) = (1 − λ)V A (rA ) + λV B (rB ) (4.139) 1 rA = 6 p ασA λ + r6 rB = 6 ασB (1 − λ)p + r6 6 (4.140) 1 6 (4.141) where V A and V B are the normal “hard core” Van der Waals or electrostatic potentials in state A (λ = 0) and state B (λ = 1) respectively, α is the soft-core parameter (set with sc_alpha in the .mdp file), p is the soft-core λ power (set with sc_power), σ is the radius of the interaction, which is (C12 /C6 )1/6 or an input parameter (sc_sigma) when C6 or C12 is zero. For intermediate λ, rA and rB alter the interactions very little for r > α1/6 σ and quickly switch the soft-core interaction to an almost constant value for smaller r (Fig. 4.16). The force is: ∂Vsc (r) r Fsc (r) = − = (1 − λ)F A (rA ) ∂r rA 5 r + λF (rB ) rB B 5 (4.142) where F A and F B are the “hard core” forces. The contribution to the derivative of the free energy is: ∂Vsc (r) ∂λ = V B (rB ) − V A (rA ) + (1 − λ) ∂V A (rA ) ∂rA ∂V B (rB ) ∂rB +λ ∂rA ∂λ ∂rB ∂λ = V B (rB ) − V A (rA ) + i pα h B −5 6 p−1 −5 6 σA λ (4.143) σB (1 − λ)p−1 − (1 − λ)F A (rA )rA λF (rB )rB 6 The original GROMOS Lennard-Jones soft-core function [105] uses p = 2, but p = 1 gives a smoother ∂H/∂λ curve. Another issue that should be considered is the soft-core effect of hydrogens without Lennard-Jones interaction. Their soft-core σ is set with sc-sigma in the .mdp file. These hydrogens produce peaks in ∂H/∂λ at λ is 0 and/or 1 for p = 1 and close to 0 and/or 1 with p = 2. Lowering sc-sigma will decrease this effect, but it will also increase the interactions with hydrogens relative to the other interactions in the soft-core state. When soft-core potentials are selected (by setting sc-alpha >0), and the Coulomb and LennardJones potentials are turned on or off sequentially, then the Coulombic interaction is turned off linearly, rather than using soft-core interactions, which should be less statistically noisy in most cases. This behavior can be overwritten by using the mdp option sc-coul to yes. Note that the sc-coul is only taken into account when lambda states are used, not with couple-lambda0 / couple-lambda1, and you can still turn off soft-core interactions by setting sc-alpha=0. Additionally, the soft-core interaction potential is only applied when either the A or B state has zero interaction potential. If both A and B states have nonzero interaction potential, default linear scaling described above is used. When both Coulombic and Lennard-Jones interactions are turned 102 Chapter 4. Interaction function and force fields i+1 i+3 i i+2 i+4 Figure 4.17: Atoms along an alkane chain. off simultaneously, a soft-core potential is used, and a hydrogen is being introduced or deleted, the sigma is set to sc-sigma-min, which itself defaults to sc-sigma-default. Recently, a new formulation of the soft-core approach has been derived that in most cases gives lower and more even statistical variance than the standard soft-core path described above. [106, 107] Specifically, we have: Vsc (r) = (1 − λ)V A (rA ) + λV B (rB ) 1 48 rA = 48 p ασA λ + r48 rB = 48 ασB (1 − λ)p + r48 (4.144) (4.145) 1 48 (4.146) This “1-1-48” path is also implemented in GROMACS. Note that for this path the soft core α should satisfy 0.001 < α < 0.003, rather than α ≈ 0.5. 4.6 4.6.1 Methods Exclusions and 1-4 Interactions. Atoms within a molecule that are close by in the chain, i.e. atoms that are covalently bonded, or linked by one or two atoms are called first neighbors, second neighbors and third neighbors, respectively (see Fig. 4.17). Since the interactions of atom i with atoms i+1 and i+2 are mainly quantum mechanical, they can not be modeled by a Lennard-Jones potential. Instead it is assumed that these interactions are adequately modeled by a harmonic bond term or constraint (i, i+1) and a harmonic angle term (i, i+2). The first and second neighbors (atoms i+1 and i+2) are therefore excluded from the Lennard-Jones interaction list of atom i; atoms i+1 and i+2 are called exclusions of atom i. For third neighbors, the normal Lennard-Jones repulsion is sometimes still too strong, which means that when applied to a molecule, the molecule would deform or break due to the internal strain. This is especially the case for carbon-carbon interactions in a cis-conformation (e.g. cisbutane). Therefore, for some of these interactions, the Lennard-Jones repulsion has been reduced in the GROMOS force field, which is implemented by keeping a separate list of 1-4 and normal Lennard-Jones parameters. In other force fields, such as OPLS [108], the standard LennardJones parameters are reduced by a factor of two, but in that case also the dispersion (r−6 ) and the Coulomb interaction are scaled. GROMACS can use either of these methods. 4.6. Methods 4.6.2 103 Charge Groups In principle, the force calculation in MD is an O(N 2 ) problem. Therefore, we apply a cut-off for non-bonded force (NBF) calculations; only the particles within a certain distance of each other are interacting. This reduces the cost to O(N ) (typically 100N to 200N ) of the NBF. It also introduces an error, which is, in most cases, acceptable, except when applying the cut-off implies the creation of charges, in which case you should consider using the lattice sum methods provided by GROMACS. Consider a water molecule interacting with another atom. If we would apply a plain cut-off on an atom-atom basis we might include the atom-oxygen interaction (with a charge of −0.82) without the compensating charge of the protons, and as a result, induce a large dipole moment over the system. Therefore, we have to keep groups of atoms with total charge 0 together. These groups are called charge groups. Note that with a proper treatment of long-range electrostatics (e.g. particlemesh Ewald (sec. 4.8.2), keeping charge groups together is not required. 4.6.3 Treatment of Cut-offs in the group scheme GROMACS is quite flexible in treating cut-offs, which implies there can be quite a number of parameters to set. These parameters are set in the input file for grompp. There are two sort of parameters that affect the cut-off interactions; you can select which type of interaction to use in each case, and which cut-offs should be used in the neighbor searching. For both Coulomb and van der Waals interactions there are interaction type selectors (termed vdwtype and coulombtype) and two parameters, for a total of six non-bonded interaction parameters. See the User Guide for a complete description of these parameters. In the group cut-off scheme, all of the interaction functions in Table 4.2 require that neighbor searching be done with a radius at least as large as the rc specified for the functional form, because of the use of charge groups. The extra radius is typically of the order of 0.25 nm (roughly the largest distance between two atoms in a charge group plus the distance a charge group can diffuse within neighbor list updates). Coulomb VdW Type Plain cut-off Reaction field Shift function Switch function Plain cut-off Shift function Switch function Parameters rc , εr rc , εrf r1 , rc , εr r1 , rc , εr rc r1 , rc r1 , rc Table 4.2: Parameters for the different functional forms of the non-bonded interactions. 104 Chapter 4. Interaction function and force fields θ a 1-a 2 a a b 1-a 3 |d | |b | 3fd |c | 3out 3fad 4fdn Figure 4.18: The six different types of virtual site construction in GROMACS. The constructing atoms are shown as black circles, the virtual sites in gray. 4.7 Virtual interaction sites Virtual interaction sites (called dummy atoms in GROMACS versions before 3.3) can be used in GROMACS in a number of ways. We write the position of the virtual site r s as a function of the positions of other particles r i : r s = f (r 1 ..r n ). The virtual site, which may carry charge or be involved in other interactions, can now be used in the force calculation. The force acting on the virtual site must be redistributed over the particles with mass in a consistent way. A good way to do this can be found in ref. [109]. We can write the potential energy as: V = V (r s , r 1 , . . . , r n ) = V ∗ (r 1 , . . . , r n ) (4.147) The force on the particle i is then: Fi = − ∂V ∗ ∂V ∂V ∂r s =− − = F direct + F 0i i ∂r i ∂r i ∂r s ∂r i (4.148) The first term is the normal force. The second term is the force on particle i due to the virtual site, which can be written in tensor notation: 0 Fi = ∂xs ∂xi ∂xs ∂yi ∂xs ∂zi ∂ys ∂xi ∂ys ∂yi ∂ys ∂zi ∂zs ∂xi ∂zs ∂yi ∂zs ∂zi Fs (4.149) where F s is the force on the virtual site and xs , ys and zs are the coordinates of the virtual site. In this way, the total force and the total torque are conserved [109]. The computation of the virial (eqn. 3.24) is non-trivial when virtual sites are used. Since the virial involves a summation over all the atoms (rather than virtual sites), the forces must be redistributed from the virtual sites to the atoms (using eqn. 4.149) before computation of the virial. In some special cases where the forces on the atoms can be written as a linear combination of the forces on the virtual sites (types 2 and 3 below) there is no difference between computing the virial before and after the redistribution of forces. However, in the general case redistribution should be done first. There are six ways to construct virtual sites from surrounding atoms in GROMACS, which we classify by the number of constructing atoms. Note that all site types mentioned can be constructed 4.7. Virtual interaction sites 105 from types 3fd (normalized, in-plane) and 3out (non-normalized, out of plane). However, the amount of computation involved increases sharply along this list, so we strongly recommended using the first adequate virtual site type that will be sufficient for a certain purpose. Fig. 4.18 depicts 6 of the available virtual site constructions. The conceptually simplest construction types are linear combinations: rs = N X wi r i (4.150) i=1 The force is then redistributed using the same weights: F 0i = wi F s (4.151) The types of virtual sites supported in GROMACS are given in the list below. Constructing atoms in virtual sites can be virtual sites themselves, but only if they are higher in the list, i.e. virtual sites can be constructed from “particles” that are simpler virtual sites. 2. As a linear combination of two atoms (Fig. 4.18 2): wi = 1 − a , wj = a (4.152) In this case the virtual site is on the line through atoms i and j. 3. As a linear combination of three atoms (Fig. 4.18 3): wi = 1 − a − b , wj = a , wk = b (4.153) In this case the virtual site is in the plane of the other three particles. 3fd. In the plane of three atoms, with a fixed distance (Fig. 4.18 3fd): rs = ri + b r ij + ar jk |r ij + ar jk | (4.154) In this case the virtual site is in the plane of the other three particles at a distance of |b| from i. The force on particles i, j and k due to the force on the virtual site can be computed as: F 0i = F 0j = (1 − a)γ(F s − p) F 0k = F s − γ(F s − p) aγ(F s − p) b |r ij + ar jk | r is · F s p= r is r is · r is γ= where (4.155) 3fad. In the plane of three atoms, with a fixed angle and distance (Fig. 4.18 3fad): r s = r i + d cos θ r ij · r jk r ij r⊥ + d sin θ where r ⊥ = r jk − r ij |r ij | |r ⊥ | r ij · r ij (4.156) In this case the virtual site is in the plane of the other three particles at a distance of |d| from i at an angle of α with r ij . Atom k defines the plane and the direction of the angle. Note that in this case b and α must be specified, instead of a and b (see also sec. 5.2.2). The force 106 Chapter 4. Interaction function and force fields on particles i, j and k due to the force on the virtual site can be computed as (with r ⊥ as defined in eqn. 4.156): F 0i F 0j ! = Fs − d cos θ F1 + |r ij | d sin θ |r ⊥ | r ij · r jk F2 + F3 r ij · r ij = d cos θ F1 − |r ij | d sin θ |r ⊥ | r ij · r jk F2 + F3 F2 + r ij · r ij ! d sin θ F2 |r ⊥ | F 0k = where F 1 = F s − r ij · F s r⊥ · F s r ij · F s r ij , F 2 = F 1 − r⊥ r ⊥ and F 3 = r ij · r ij r⊥ · r⊥ r ij · r ij (4.157) 3out. As a non-linear combination of three atoms, out of plane (Fig. 4.18 3out): r s = r i + ar ij + br ik + c(r ij × r ik ) (4.158) This enables the construction of virtual sites out of the plane of the other atoms. The force on particles i, j and k due to the force on the virtual site can be computed as: F 0j a −c zik c yik a −c xik F s = c zik −c yik c xik a F 0k b = −c zij c yij F 0i = F s − F 0j − F 0k c zij b −c xij −c yij c xij F s b (4.159) 4fdn. From four atoms, with a fixed distance, see separate Fig. 4.19. This construction is a bit complex, in particular since the previous type (4fd) could be unstable which forced us to introduce a more elaborate construction: rja = a rik − rij = a (xk − xi ) − (xj − xi ) rjb = b ril − rij = b (xl − xi ) − (xj − xi ) rm = rja × rjb rm xs = xi + c |rm | (4.160) In this case the virtual site is at a distance of |c| from i, while a and b are parameters. Note that the vectors rik and rij are not normalized to save floating-point operations. The force on particles i, j, k and l due to the force on the virtual site are computed through chain rule derivatives of the construction expression. This is exact and conserves energy, but it does 4.8. Long Range Electrostatics 107 xs xi xk xl rjb rja xj Figure 4.19: The new 4fdn virtual site construction, which is stable even when all constructing atoms are in the same plane. lead to relatively lengthy expressions that we do not include here (over 200 floating-point operations). The interested reader can look at the source code in vsite.c. Fortunately, this vsite type is normally only used for chiral centers such as Cα atoms in proteins. The new 4fdn construct is identified with a ‘type’ value of 2 in the topology. The earlier 4fd type is still supported internally (‘type’ value 1), but it should not be used for new simulations. All current GROMACS tools will automatically generate type 4fdn instead. N. A linear combination of N atoms with relative weights ai . The weight for atom i is: wi = a i N X −1 aj (4.161) j=1 There are three options for setting the weights: COG center of geometry: equal weights COM center of mass: ai is the mass of atom i; when in free-energy simulations the mass of the atom is changed, only the mass of the A-state is used for the weight COW center of weights: ai is defined by the user 4.8 4.8.1 Long Range Electrostatics Ewald summation The total electrostatic energy of N particles and their periodic images is given by V = N X N f XXXX qi qj . 2 nx ny nz ∗ i j rij,n (4.162) 108 Chapter 4. Interaction function and force fields (nx , ny , nz ) = n is the box index vector, and the star indicates that terms with i = j should be omitted when (nx , ny , nz ) = (0, 0, 0). The distance rij,n is the real distance between the charges and not the minimum-image. This sum is conditionally convergent, but very slow. Ewald summation was first introduced as a method to calculate long-range interactions of the periodic images in crystals [110]. The idea is to convert the single slowly-converging sum eqn. 4.162 into two quickly-converging terms and a constant term: V = Vdir + Vrec + V0 (4.163) Vdir = N XXX fX erfc(βrij,n ) qi qj 2 i,j nx ny nz ∗ rij,n Vrec = N X X X exp −(πm/β)2 + 2πim · (ri − rj ) f X qi qj 2πV i,j m2 mx my mz ∗ (4.164) N fβ X V0 = − √ q2, π i i (4.165) (4.166) where β is a parameter that determines the relative weight of the direct and reciprocal sums and m = (mx , my , mz ). In this way we can use a short cut-off (of the order of 1 nm) in the direct space sum and a short cut-off in the reciprocal space sum (e.g. 10 wave vectors in each direction). Unfortunately, the computational cost of the reciprocal part of the sum increases as N 2 (or N 3/2 with a slightly better algorithm) and it is therefore not realistic for use in large systems. Using Ewald Don’t use Ewald unless you are absolutely sure this is what you want - for almost all cases the PME method below will perform much better. If you still want to employ classical Ewald summation enter this in your .mdp file, if the side of your box is about 3 nm: coulombtype rvdw rlist rcoulomb fourierspacing ewald-rtol = = = = = = Ewald 0.9 0.9 0.9 0.6 1e-5 The ratio of the box dimensions and the fourierspacing parameter determines the highest magnitude of wave vectors mx , my , mz to use in each direction. With a 3-nm cubic box this example would use 11 wave vectors (from −5 to 5) in each direction. The ewald-rtol parameter is the relative strength of the electrostatic interaction at the cut-off. Decreasing this gives you a more accurate direct sum, but a less accurate reciprocal sum. 4.8.2 PME Particle-mesh Ewald is a method proposed by Tom Darden [14] to improve the performance of the reciprocal sum. Instead of directly summing wave vectors, the charges are assigned to a grid 4.8. Long Range Electrostatics 109 using interpolation. The implementation in GROMACS uses cardinal B-spline interpolation [15], which is referred to as smooth PME (SPME). The grid is then Fourier transformed with a 3D FFT algorithm and the reciprocal energy term obtained by a single sum over the grid in k-space. The potential at the grid points is calculated by inverse transformation, and by using the interpolation factors we get the forces on each atom. The PME algorithm scales as N log(N ), and is substantially faster than ordinary Ewald summation on medium to large systems. On very small systems it might still be better to use Ewald to avoid the overhead in setting up grids and transforms. For the parallelization of PME see the section on MPMD PME (3.17.5). With the Verlet cut-off scheme, the PME direct space potential is shifted by a constant such that the potential is zero at the cut-off. This shift is small and since the net system charge is close to zero, the total shift is very small, unlike in the case of the Lennard-Jones potential where all shifts add up. We apply the shift anyhow, such that the potential is the exact integral of the force. Using PME As an example for using Particle-mesh Ewald summation in GROMACS, specify the following lines in your .mdp file: coulombtype rvdw rlist rcoulomb fourierspacing pme-order ewald-rtol = = = = = = = PME 0.9 0.9 0.9 0.12 4 1e-5 In this case the fourierspacing parameter determines the maximum spacing for the FFT grid (i.e. minimum number of grid points), and pme-order controls the interpolation order. Using fourth-order (cubic) interpolation and this spacing should give electrostatic energies accurate to about 5 · 10−3 . Since the Lennard-Jones energies are not this accurate it might even be possible to increase this spacing slightly. Pressure scaling works with PME, but be aware of the fact that anisotropic scaling can introduce artificial ordering in some systems. 4.8.3 P3M-AD The Particle-Particle Particle-Mesh methods of Hockney & Eastwood can also be applied in GROMACS for the treatment of long range electrostatic interactions [111]. Although the P3M method was the first efficient long-range electrostatics method for molecular simulation, the smooth PME (SPME) method has largely replaced P3M as the method of choice in atomistic simulations. One performance disadvantage of the original P3M method was that it required 3 3D-FFT back transforms to obtain the forces on the particles. But this is not required for P3M and the forces can be 110 Chapter 4. Interaction function and force fields derived through analytical differentiation of the potential, as done in PME. The resulting method is termed P3M-AD. The only remaining difference between P3M-AD and PME is the optimization of the lattice Green influence function for error minimization that P3M uses. However, in 2012 it has been shown that the SPME influence function can be modified to obtain P3M [112]. This means that the advantage of error minimization in P3M-AD can be used at the same computational cost and with the same code as PME, just by adding a few lines to modify the influence function. However, at optimal parameter setting the effect of error minimization in P3M-AD is less than 10%. P3M-AD does show large accuracy gains with interlaced (also known as staggered) grids, but that is not supported in GROMACS (yet). P3M is used in GROMACS with exactly the same options as used with PME by selecting the electrostatics type: coulombtype 4.8.4 = P3M-AD Optimizing Fourier transforms and PME calculations It is recommended to optimize the parameters for calculation of electrostatic interaction such as PME grid dimensions and cut-off radii. This is particularly relevant to do before launching long production runs. gmx mdrun will automatically do a lot of PME optimization, and GROMACS also includes a special tool, gmx tune_pme, which automates the process of selecting the optimal number of PME-only ranks. 4.9 Long Range Van der Waals interactions 4.9.1 Dispersion correction In this section, we derive long-range corrections due to the use of a cut-off for Lennard-Jones or Buckingham interactions. We assume that the cut-off is so long that the repulsion term can safely be neglected, and therefore only the dispersion term is taken into account. Due to the nature of the dispersion interaction (we are truncating a potential proportional to −r−6 ), energy and pressure corrections are both negative. While the energy correction is usually small, it may be important for free energy calculations where differences between two different Hamiltonians are considered. In contrast, the pressure correction is very large and can not be neglected under any circumstances where a correct pressure is required, especially for any NPT simulations. Although it is, in principle, possible to parameterize a force field such that the pressure is close to the desired experimental value without correction, such a method makes the parameterization dependent on the cut-off and is therefore undesirable. Energy The long-range contribution of the dispersion interaction to the virial can be derived analytically, if we assume a homogeneous system beyond the cut-off distance rc . The dispersion energy between 4.9. Long Range Van der Waals interactions 111 two particles is written as: −6 V (rij ) = −C6 rij (4.167) −8 F ij = −6 C6 rij r ij (4.168) and the corresponding force is: In a periodic system it is not easy to calculate the full potentials, so usually a cut-off is applied, which can be abrupt or smooth. We will call the potential and force with cut-off Vc and F c . The long-range contribution to the dispersion energy in a system with N particles and particle density ρ = N/V is: Z ∞ 1 Vlr = N ρ 4πr2 g(r) (V (r) − Vc (r)) dr (4.169) 2 0 We will integrate this for the shift function, which is the most general form of van der Waals interaction available in GROMACS. The shift function has a constant difference S from 0 to r1 and is 0 beyond the cut-off distance rc . We can integrate eqn. 4.169, assuming that the density in the sphere within r1 is equal to the global density and the radial distribution function g(r) is 1 beyond r1 : Vlr = = rc ∞ r1 1 4πr2 V (r) dr 4πr2 g(r) C6 S dr + ρ 4πr2 (V (r) − Vc (r)) dr + ρ N ρ 2 r rc 0 Z rc 1 1 4 4 3 2 −3 N πρr1 − 1 C6 S + ρ 4πr (V (r) − Vc (r)) dr − πN ρ C6 rc (4.170) 2 3 3 r1 Z Z Z where the term −1 corrects for the self-interaction. For a plain cut-off we only need to assume that g(r) is 1 beyond rc and the correction reduces to [113]: 2 Vlr = − πN ρ C6 rc−3 3 (4.171) If we consider, for example, a box of pure water, simulated with a cut-off of 0.9 nm and a density of 1 g cm−3 this correction is −0.75 kJ mol−1 per molecule. For a homogeneous mixture we need to define an average dispersion constant: hC6 i = N N X X 2 C6 (i, j) N (N − 1) i j>i (4.172) In GROMACS, excluded pairs of atoms do not contribute to the average. In the case of inhomogeneous simulation systems, e.g. a system with a lipid interface, the energy correction can be applied if hC6 i for both components is comparable. Virial and pressure The scalar virial of the system due to the dispersion interaction between two particles i and j is given by: 1 −6 Ξ = − r ij · F ij = 3 C6 rij (4.173) 2 The pressure is given by: 2 P = (Ekin − Ξ) (4.174) 3V 112 Chapter 4. Interaction function and force fields The long-range correction to the virial is given by: Ξlr 1 = Nρ 2 Z ∞ 4πr2 g(r)(Ξ − Ξc ) dr (4.175) 0 We can again integrate the long-range contribution to the virial assuming g(r) is 1 beyond r1 : ∞ rc 1 −6 4πr2 3 C6 rij dr 4πr2 (Ξ − Ξc ) dr + Nρ 2 rc r1 Z rc 1 2 −3 4πr (Ξ − Ξc ) dr + 4πC6 rc Nρ 2 r1 Z Ξlr = = Z (4.176) For a plain cut-off the correction to the pressure is [113]: 4 Plr = − πC6 ρ2 rc−3 3 (4.177) Using the same example of a water box, the correction to the virial is 0.75 kJ mol−1 per molecule, the corresponding correction to the pressure for SPC water is approximately −280 bar. For homogeneous mixtures, we can again use the average dispersion constant hC6 i (eqn. 4.172): 4 Plr = − π hC6 i ρ2 rc−3 3 (4.178) For inhomogeneous systems, eqn. 4.178 can be applied under the same restriction as holds for the energy (see sec. 4.9.1). 4.9.2 Lennard-Jones PME In order to treat systems, using Lennard-Jones potentials, that are non-homogeneous outside of the cut-off distance, we can instead use the Particle-mesh Ewald method as discussed for electrostatics above. In this case the modified Ewald equations become V = Vdir + Vrec + V0 Vdir = − Vrec = N X X X ij 1X C6 g(βrij,n ) 2 i,j nx ny nz ∗ rij,n 6 3 N X π 2 β3 X X X f (π|m|/β) × C6ij exp [−2πim · (ri − rj )] 2V mx my mz ∗ i,j V0 = − N β6 X C ii 12 i 6 (4.179) (4.180) (4.181) (4.182) where m = (mx , my , mz ), β is the parameter determining the weight between direct and reciprocal space, and C6ij is the combined dispersion parameter for particle i and j. The star indicates that terms with i = j should be omitted when ((nx , ny , nz ) = (0, 0, 0)), and rij,n is the real distance 4.9. Long Range Van der Waals interactions 113 between the particles. Following the derivation by Essmann [15], the functions f and g introduced above are defined as h i √ f (x) = 1/3 (1 − 2x2 )exp(−x2 ) + 2x3 π erfc(x) (4.183) g(x) = exp(−x2 )(1 + x2 + x4 ). 2 (4.184) The above methodology works fine as long as the dispersion parameters can be combined geometrically (eqn. 4.6) in the same way as the charges for electrostatics ij C6,geom = C6ii C6jj 1/2 (4.185) For Lorentz-Berthelot combination rules (eqn. 4.7), the reciprocal part of this sum has to be calculated seven times due to the splitting of the dispersion parameter according to ij C6,L−B = (σi + σj )6 = 6 X (6−n) Pn σin σj , (4.186) n=0 for Pn the Pascal triangle coefficients. This introduces a non-negligible cost to the reciprocal part, requiring seven separate FFTs, and therefore this has been the limiting factor in previous attempts to implement LJ-PME. A solution to this problem is to use geometrical combination rules in order to calculate an approximate interaction parameter for the reciprocal part of the potential, yielding a total interaction of recip V (r < rc ) = C6dir g(βr)r−6 + C6,geom [1 − g(βr)]r−6 | {z Direct space } | {z Reciprocal space } recip recip r−6 + C6dir − C6,geom g(βr)r−6 = C6,geom (4.187) recip V (r > rc ) = C6,geom [1 − g(βr)]r−6 . | {z Reciprocal space (4.188) } This will preserve a well-defined Hamiltonian and significantly increase the performance of the simulations. The approximation does introduce some errors, but since the difference is located in the interactions calculated in reciprocal space, the effect will be very small compared to the total interaction energy. In a simulation of a lipid bilayer, using a cut-off of 1.0 nm, the relative error in total dispersion energy was below 0.5%. A more thorough discussion of this can be found in [114]. In GROMACS we now perform the proper calculation of this interaction by subtracting, from the direct-space interactions, the contribution made by the approximate potential that is used in the reciprocal part Vdir = C6dir r−6 − C6recip [1 − g(βr)]r−6 . (4.189) This potential will reduce to the expression in eqn. 4.180 when C6dir = C6recip , and the total interaction is given by V (r < rc ) = C6dir r−6 − C6recip [1 − g(βr)]r−6 + C6recip [1 − g(βr)]r−6 | {z Direct space = C6dir r−6 V (r > rc ) = C6recip [1 } | {z Reciprocal space } (4.190) − g(βr)]r −6 . (4.191) 114 Chapter 4. Interaction function and force fields For the case when C6dir 6= C6recip this will retain an unmodified LJ force up to the cut-off, and the error is an order of magnitude smaller than in simulations where the direct-space interactions do not account for the approximation used in reciprocal space. When using a VdW interaction modifier of potential-shift, the constant −C6dir + C6recip [1 − g(βrc )] rc−6 (4.192) is added to eqn. 4.190 in order to ensure that the potential is continuous at the cutoff. Note that, in the same way as eqn. 4.189, this degenerates into the expected −C6 g(βrc )rc−6 when C6dir = C6recip . In addition to this, a long-range dispersion correction can be applied to correct for the approximation using a combination rule in reciprocal space. This correction assumes, as for the cut-off LJ potential, a uniform particle distribution. But since the error of the combination rule approximation is very small this long-range correction is not necessary in most cases. Also note that this homogenous correction does not correct the surface tension, which is an inhomogeneous property. Using LJ-PME As an example for using Particle-mesh Ewald summation for Lennard-Jones interactions in GROMACS, specify the following lines in your .mdp file: vdwtype rvdw vdw-modifier rlist rcoulomb fourierspacing pme-order ewald-rtol-lj lj-pme-comb-rule = = = = = = = = = PME 0.9 Potential-Shift 0.9 0.9 0.12 4 0.001 geometric The same Fourier grid and interpolation order are used if both LJ-PME and electrostatic PME are active, so the settings for fourierspacing and pme-order are common to both. ewald-rtol-lj controls the splitting between direct and reciprocal space in the same way as ewald-rtol. In addition to this, the combination rule to be used in reciprocal space is determined by lj-pme-comb-rule. If the current force field uses Lorentz-Berthelot combination rules, it is possible to set lj-pme-comb-rule = geometric in order to gain a significant increase in performance for a small loss in accuracy. The details of this approximation can be found in the section above. Note that the use of a complete long-range dispersion correction means that as with Coulomb PME, rvdw is now a free parameter in the method, rather than being necessarily restricted by the force-field parameterization scheme. Thus it is now possible to optimize the cutoff, spacing, order and tolerance terms for accuracy and best performance. Naturally, the use of LJ-PME rather than LJ cut-off adds computation and communication done for the reciprocal-space part, so for best performance in balancing the load of parallel simulations using PME-only ranks, more such ranks should be used. It may be possible to improve upon the automatic load-balancing used by mdrun. 4.10. Force field 4.10 115 Force field A force field is built up from two distinct components: • The set of equations (called the potential functions) used to generate the potential energies and their derivatives, the forces. These are described in detail in the previous chapter. • The parameters used in this set of equations. These are not given in this manual, but in the data files corresponding to your GROMACS distribution. Within one set of equations various sets of parameters can be used. Care must be taken that the combination of equations and parameters form a consistent set. It is in general dangerous to make ad hoc changes in a subset of parameters, because the various contributions to the total force are usually interdependent. This means in principle that every change should be documented, verified by comparison to experimental data and published in a peer-reviewed journal before it can be used. GROMACS 2018.3 includes several force fields, and additional ones are available on the website. If you do not know which one to select we recommend GROMOS-96 for united-atom setups and OPLS-AA/L for all-atom parameters. That said, we describe the available options in some detail. All-hydrogen force field The GROMOS-87-based all-hydrogen force field is almost identical to the normal GROMOS-87 force field, since the extra hydrogens have no Lennard-Jones interaction and zero charge. The only differences are in the bond angle and improper dihedral angle terms. This force field is only useful when you need the exact hydrogen positions, for instance for distance restraints derived from NMR measurements. When citing this force field please read the previous paragraph. 4.10.1 GROMOS-96 GROMACS supports the GROMOS-96 force fields [82]. All parameters for the 43A1, 43A2 (development, improved alkane dihedrals), 45A3, 53A5, and 53A6 parameter sets are included. All standard building blocks are included and topologies can be built automatically by pdb2gmx. The GROMOS-96 force field is a further development of the GROMOS-87 force field. It has improvements over the GROMOS-87 force field for proteins and small molecules. Note that the sugar parameters present in 53A6 do correspond to those published in 2004[115], which are different from those present in 45A4, which is not included in GROMACS at this time. The 45A4 parameter set corresponds to a later revision of these parameters. The GROMOS-96 force field is not, however, recommended for use with long alkanes and lipids. The GROMOS-96 force field differs from the GROMOS-87 force field in a few respects: • the force field parameters • the parameters for the bonded interactions are not linked to atom types • a fourth power bond stretching potential (4.2.1) 116 Chapter 4. Interaction function and force fields • an angle potential based on the cosine of the angle (4.2.6) There are two differences in implementation between GROMACS and GROMOS-96 which can lead to slightly different results when simulating the same system with both packages: • in GROMOS-96 neighbor searching for solvents is performed on the first atom of the solvent molecule. This is not implemented in GROMACS, but the difference with searching by centers of charge groups is very small • the virial in GROMOS-96 is molecule-based. This is not implemented in GROMACS, which uses atomic virials The GROMOS-96 force field was parameterized with a Lennard-Jones cut-off of 1.4 nm, so be sure to use a Lennard-Jones cut-off (rvdw) of at least 1.4. A larger cut-off is possible because the Lennard-Jones potential and forces are almost zero beyond 1.4 nm. GROMOS-96 files GROMACS can read and write GROMOS-96 coordinate and trajectory files. These files should have the extension .g96. Such a file can be a GROMOS-96 initial/final configuration file, a coordinate trajectory file, or a combination of both. The file is fixed format; all floats are written as 15.9, and as such, files can get huge. GROMACS supports the following data blocks in the given order: • Header block: TITLE (mandatory) • Frame blocks: TIMESTEP (optional) POSITION/POSITIONRED (mandatory) VELOCITY/VELOCITYRED (optional) BOX (optional) See the GROMOS-96 manual [82] for a complete description of the blocks. Note that all GROMACS programs can read compressed (.Z) or gzipped (.gz) files. 4.10.2 OPLS/AA 4.10.3 AMBER GROMACS provides native support for the following AMBER force fields: • AMBER94 [116] 4.10. Force field 117 • AMBER96 [117] • AMBER99 [118] • AMBER99SB [119] • AMBER99SB-ILDN [120] • AMBER03 [121] • AMBERGS [122] 4.10.4 CHARMM GROMACS supports the CHARMM force field for proteins [123, 124], lipids [125] and nucleic acids [126, 127]. The protein parameters (and to some extent the lipid and nucleic acid parameters) were thoroughly tested – both by comparing potential energies between the port and the standard parameter set in the CHARMM molecular simulation package, as well by how the protein force field behaves together with GROMACS-specific techniques such as virtual sites (enabling long time steps) and a fast implicit solvent recently implemented [74] – and the details and results are presented in the paper by Bjelkmar et al. [128]. The nucleic acid parameters, as well as the ones for HEME, were converted and tested by Michel Cuendet. When selecting the CHARMM force field in pdb2gmx the default option is to use CMAP (for torsional correction map). To exclude CMAP, use -nocmap. The basic form of the CMAP term implemented in GROMACS is a function of the φ and ψ backbone torsion angles. This term is defined in the .rtp file by a [ cmap ] statement at the end of each residue supporting CMAP. The following five atom names define the two torsional angles. Atoms 1-4 define φ, and atoms 2-5 define ψ. The corresponding atom types are then matched to the correct CMAP type in the cmap.itp file that contains the correction maps. A port of the CHARMM36 force field for use with GROMACS is also available at http:// mackerell.umaryland.edu/charmm_ff.shtml#gromacs. For branched polymers or other topologies not supported by pdb2gmx, it is possible to use TopoTools [129] to generate a GROMACS top file. 4.10.5 Coarse-grained force fields Coarse-graining is a systematic way of reducing the number of degrees of freedom representing a system of interest. To achieve this, typically whole groups of atoms are represented by single beads and the coarse-grained force fields describes their effective interactions. Depending on the choice of parameterization, the functional form of such an interaction can be complicated and often tabulated potentials are used. Coarse-grained models are designed to reproduce certain properties of a reference system. This can be either a full atomistic model or even experimental data. Depending on the properties to reproduce there are different methods to derive such force fields. An incomplete list of methods is given below: 118 Chapter 4. Interaction function and force fields • Conserving free energies – Simplex method – MARTINI force field (see next section) • Conserving distributions (like the radial distribution function), so-called structure-based coarse-graining – (iterative) Boltzmann inversion – Inverse Monte Carlo • Conversing forces – Force matching Note that coarse-grained potentials are state dependent (e.g. temperature, density,...) and should be re-parametrized depending on the system of interest and the simulation conditions. This can for example be done using the Versatile Object-oriented Toolkit for Coarse-Graining Applications (VOTCA) [130]. The package was designed to assists in systematic coarse-graining, provides implementations for most of the algorithms mentioned above and has a well tested interface to GROMACS. It is available as open source and further information can be found at www.votca.org. 4.10.6 MARTINI The MARTINI force field is a coarse-grain parameter set that allows for the construction of many systems, including proteins and membranes. 4.10.7 PLUM The PLUM force field [131] is an example of a solvent-free protein-membrane model for which the membrane was derived from structure-based coarse-graining [132]. A GROMACS implementation can be found at code.google.com/p/plumx. Chapter 5 Topologies 5.1 Introduction GROMACS must know on which atoms and combinations of atoms the various contributions to the potential functions (see chapter 4) must act. It must also know what parameters must be applied to the various functions. All this is described in the topology file *.top, which lists the constant attributes of each atom. There are many more atom types than elements, but only atom types present in biological systems are parameterized in the force field, plus some metals, ions and silicon. The bonded and special interactions are determined by fixed lists that are included in the topology file. Certain non-bonded interactions must be excluded (first and second neighbors), as these are already treated in bonded interactions. In addition, there are dynamic attributes of atoms - their positions, velocities and forces. These do not strictly belong to the molecular topology, and are stored in the coordinate file *.gro (positions and velocities), or trajectory file *.trr (positions, velocities, forces). This chapter describes the setup of the topology file, the *.top file and the database files: what the parameters stand for and how/where to change them if needed. First, all file formats are explained. Section 5.9.1 describes the organization of the files in each force field. Note: if you construct your own topologies, we encourage you to upload them to our topology archive at www.gromacs.org! Just imagine how thankful you’d have been if your topology had been available there before you started. The same goes for new force fields or modified versions of the standard force fields - contribute them to the force field archive! 5.2 Particle type In GROMACS, there are three types of particles, see Table 5.1. Only regular atoms and virtual interaction sites are used in GROMACS; shells are necessary for polarizable models like the ShellWater models [45]. 120 Chapter 5. Topologies Particle atoms shells virtual interaction sites Symbol A S V (or D) Table 5.1: Particle types in GROMACS 5.2.1 Atom types Each force field defines a set of atom types, which have a characteristic name or number, and mass (in a.m.u.). These listings are found in the atomtypes.atp file (.atp = atom type parameter file). Therefore, it is in this file that you can begin to change and/or add an atom type. A sample from the gromos43a1.ff force field is listed below. O OM OA OW N NT NL NR NZ NE C CH1 CH2 CH3 15.99940 15.99940 15.99940 15.99940 14.00670 14.00670 14.00670 14.00670 14.00670 14.00670 12.01100 13.01900 14.02700 15.03500 ; ; ; ; ; ; ; ; ; ; ; ; ; ; carbonyl oxygen (C=O) carboxyl oxygen (CO-) hydroxyl, sugar or ester oxygen water oxygen peptide nitrogen (N or NH) terminal nitrogen (NH2) terminal nitrogen (NH3) aromatic nitrogen Arg NH (NH2) Arg NE (NH) bare carbon aliphatic or sugar CH-group aliphatic or sugar CH2-group aliphatic CH3-group Note: GROMACS makes use of the atom types as a name, not as a number (as e.g. in GROMOS). 5.2.2 Virtual sites Some force fields use virtual interaction sites (interaction sites that are constructed from other particle positions) on which certain interactions are located (e.g. on benzene rings, to reproduce the correct quadrupole). This is described in sec. 4.7. To make virtual sites in your system, you should include a section [ virtual_sites? ] (for backward compatibility the old name [ dummies? ] can also be used) in your topology file, where the ‘?’ stands for the number constructing particles for the virtual site. This will be ‘2’ for type 2, ‘3’ for types 3, 3fd, 3fad and 3out and ‘4’ for type 4fdn. The last of these replace an older 4fd type (with the ‘type’ value 1) that could occasionally be unstable; while it is still supported internally in the code, the old 4fd type should not be used in new input files. The different types are explained in sec. 4.7. Parameters for type 2 should look like this: [ virtual_sites2 ] 5.2. Particle type ; Site 5 from 1 121 2 funct 1 a 0.7439756 3 funct 1 a 0.7439756 b 0.128012 3 funct 2 a 0.5 d -0.105 3 funct 3 theta 120 d 0.5 3 funct 4 a -0.4 b -0.4 for type 3 like this: [ virtual_sites3 ] ; Site from 5 1 2 for type 3fd like this: [ virtual_sites3 ] ; Site from 5 1 2 for type 3fad like this: [ virtual_sites3 ] ; Site from 5 1 2 for type 3out like this: [ virtual_sites3 ] ; Site from 5 1 2 c 6.9281 for type 4fdn like this: [ virtual_sites4 ] ; Site from 5 1 2 3 funct 2 4 a 1.0 b 0.9 c 0.105 This will result in the construction of a virtual site, number 5 (first column ‘Site’), based on the positions of the atoms whose indices are 1 and 2 or 1, 2 and 3 or 1, 2, 3 and 4 (next two, three or four columns ‘from’) following the rules determined by the function number (next column ‘funct’) with the parameters specified (last one, two or three columns ‘a b . .’). Obviously, the atom numbers (including virtual site number) depend on the molecule. It may be instructive to study the topologies for TIP4P or TIP5P water models that are included with the GROMACS distribution. Note that if any constant bonded interactions are defined between virtual sites and/or normal atoms, they will be removed by grompp (unless the option tt -normvsbds is used). This removal of bonded interactions is done after generating exclusions, as the generation of exclusions is based on “chemically” bonded interactions. Virtual sites can be constructed in a more generic way using basic geometric parameters. The directive that can be used is [ virtual_sitesn ]. Required parameters are listed in Table 5.5. An example entry for defining a virtual site at the center of geometry of a given set of atoms might be: [ virtual_sitesn ] ; Site funct from 5 1 1 2 3 4 122 Chapter 5. Topologies Property Type Mass Charge epsilon sigma Symbol m q σ Unit a.m.u. electron kJ/mol nm Table 5.2: Static atom type properties in GROMACS 5.3 5.3.1 Parameter files Atoms The static properties (see Table 5.2 assigned to the atom types are assigned based on data in several places. The mass is listed in atomtypes.atp (see 5.2.1), whereas the charge is listed in *.rtp (.rtp = residue topology parameter file, see 5.7.1). This implies that the charges are only defined in the building blocks of amino acids, nucleic acids or otherwise, as defined by the user. When generating a topology (*.top) using the pdb2gmx program, the information from these files is combined. 5.3.2 Non-bonded parameters The non-bonded parameters consist of the van der Waals parameters V (c6 or σ, depending on the combination rule) and W (c12 or ), as listed in the file ffnonbonded.itp, where ptype is the particle type (see Table 5.1). As with the bonded parameters, entries in [ *type ] directives are applied to their counterparts in the topology file. Missing parameters generate warnings, except as noted below in section 5.4.3. [ atomtypes ] ;name at.num O 8 OM 8 ..... mass 15.99940 15.99940 [ nonbond_params ] ; i j func V(c6) O O 1 0.22617E-02 O OA 1 0.22617E-02 ..... charge 0.000 0.000 ptype A A V(c6) 0.22617E-02 0.22617E-02 W(c12) 0.74158E-06 0.74158E-06 W(c12) 0.74158E-06 0.13807E-05 Note that most of the included force fields also include the at.num. column, but this same information is implied in the OPLS-AA bond_type column. The interpretation of the parameters V and W depends on the combination rule that was chosen in the [ defaults ] section of the topology file (see 5.8.1): (6) for combination rule 1 : Vii = Ci = 4 i σi6 [ kJ mol−1 nm6 ] (5.1) (12) Wii = Ci = 4 i σi12 [ kJ mol−1 nm12 ] 5.3. Parameter files 123 Vii = σi [ nm ] Wii = i [ kJ mol−1 ] for combination rules 2 and 3 : (5.2) Some or all combinations for different atom types can be given in the [ nonbond_params ] section, again with parameters V and W as defined above. Any combination that is not given will be computed from the parameters for the corresponding atom types, according to the combination rule: (6) Cij for combination rules 1 and 3 : (12) Cij σij ij for combination rule 2 : 1 = Ci = (12) (12) Ci Cj (6) (6) Cj = 21 (σi + σj ) √ = i j 2 1 (5.3) 2 (5.4) When σ and need to be supplied (rules 2 and 3), it would seem it is impossible to have a non-zero C 12 combined with a zero C 6 parameter. However, providing a negative σ will do exactly that, such that C 6 is set to zero and C 12 is calculated normally. This situation represents a special case in reading the value of σ, and nothing more. There is only one set of combination rules for Buckingham potentials: 1/2 = (Aii Ajj ) = 2/ B1ii + B1jj Aij Bij Cij 5.3.3 = (Cii Cjj ) (5.5) 1/2 Bonded parameters The bonded parameters (i.e. bonds, bond angles, improper and proper dihedrals) are listed in ffbonded.itp. The entries in this database describe, respectively, the atom types in the interactions, the type of the interaction, and the parameters associated with that interaction. These parameters are then read by grompp when processing a topology and applied to the relevant bonded parameters, i.e. bondtypes are applied to entries in the [ bonds ] directive, etc. Any bonded parameter that is missing from the relevant [ *type ] directive generates a fatal error. The types of interactions are listed in Table 5.5. Example excerpts from such files follow: [ bondtypes ] ; i j func C O 1 C OM 1 ...... b0 0.12300 0.12500 [ angletypes ] ; i j k func HO OA C 1 HO OA CH1 1 ...... [ dihedraltypes ] kb 502080. 418400. th0 109.500 109.500 cth 397.480 397.480 124 Chapter 5. Topologies ; i l func NR5* NR5 2 NR5* NR5* 2 ...... q0 0.000 0.000 cq 167.360 167.360 [ dihedraltypes ] ; j k func phi0 C OA 1 180.000 C N 1 180.000 ...... cp 16.736 33.472 [ dihedraltypes ] ; ; Ryckaert-Bellemans Dihedrals ; ; aj ak funct CP2 CP2 3 9.2789 12.156 mult 2 2 -13.120 -3.0597 26.240 -31.495 In the ffbonded.itp file, you can add bonded parameters. If you want to include parameters for new atom types, make sure you define them in atomtypes.atp as well. For most interaction types, bonded parameters are searched and assigned using an exact match for all type names and allowing only a single set of parameters. The exception to this rule are dihedral parameters. For [ dihedraltypes ] wildcard atom type names can be specified with the letter X in one or more of the four positions. Thus one can for example assign proper dihedral parameters based on the types of the middle two atoms. The parameters for the entry with the most exact matches, i.e. the least wildcard matches, will be used. Note that GROMACS versions older than 5.1.3 used the first match, which means that a full match would be ignored if it is preceded by an entry that matches on wildcards. Thus it is suggested to put wildcard entries at the end, in case someone might use a forcefield with older versions of GROMACS. In addition there is a dihedral type 9 which adds the possibility of assigning multiple dihedral potentials, useful for combining terms with different multiplicities. The different dihedral potential parameter sets should be on directly adjacent lines in the [ dihedraltypes ] section. 5.4 Molecule definition 5.4.1 Moleculetype entries An organizational structure that usually corresponds to molecules is the [ moleculetype ] entry. This entry serves two main purposes. One is to give structure to the topology file(s), usually corresponding to real molecules. This makes the topology easier to read and writing it less labor intensive. A second purpose is computational efficiency. The system definition that is kept in memory is proportional in size of the moleculetype definitions. If a molecule is present in 100000 copies, this saves a factor of 100000 in memory, which means the system usually fits in cache, which can improve performance tremendously. Interactions that correspond to chemical bonds, that generate exclusions, can only be defined between atoms within a moleculetype. It is allowed to have multiple molecules which are not covalently bonded in one moleculetype 5.4. Molecule definition 125 definition. Molecules can be made infinitely long by connecting to themselves over periodic boundaries. When such periodic molecules are present, an option in the mdp file needs to be set to tell GROMACS not to attempt to make molecules that are broken over periodic boundaries whole again. 5.4.2 Intermolecular interactions In some cases, one would like atoms in different molecules to also interact with other interactions than the usual non-bonded interactions. This is often the case in binding studies. When the molecules are covalently bound, e.g. a ligand binding covalently to a protein, they are effectively one molecule and they should be defined in one [ moleculetype ] entry. Note that pdb2gmx has an option to put two or more molecules in one [ moleculetype ] entry. When molecules are not covalently bound, it is much more convenient to use separate moleculetype definitions and specify the intermolecular interactions in the [ intermolecular_interactions] section. In this section, which is placed at the end of the topology (see Table 5.4), normal bonded interactions can be specified using global atom indices. The only restrictions are that no interactions can be used that generates exclusions and no constraints can be used. 5.4.3 Intramolecular pair interactions Extra Lennard-Jones and electrostatic interactions between pairs of atoms in a molecule can be added in the [ pairs ] section of a molecule definition. The parameters for these interactions can be set independently from the non-bonded interaction parameters. In the GROMOS force fields, pairs are only used to modify the 1-4 interactions (interactions of atoms separated by three bonds). In these force fields the 1-4 interactions are excluded from the non-bonded interactions (see sec. 5.4.4). [ pairtypes ] ; i j func cs6 O O 1 0.22617E-02 O OM 1 0.22617E-02 ..... cs12 ; THESE ARE 1-4 INTERACTIONS 0.74158E-06 0.74158E-06 The pair interaction parameters for the atom types in ffnonbonded.itp are listed in the [ pairtypes ] section. The GROMOS force fields list all these interaction parameters explicitly, but this section might be empty for force fields like OPLS that calculate the 1-4 interactions by uniformly scaling the parameters. Pair parameters that are not present in the [ pairtypes ] section are only generated when gen-pairs is set to “yes” in the [ defaults ] directive of forcefield.itp (see 5.8.1). When gen-pairs is set to “no,” grompp will give a warning for each pair type for which no parameters are given. The normal pair interactions, intended for 1-4 interactions, have function type 1. Function type 2 and the [ pairs_nb ] are intended for free-energy simulations. When determining hydration free energies, the solute needs to be decoupled from the solvent. This can be done by adding a B-state topology (see sec. 3.12) that uses zero for all solute non-bonded parameters, i.e. charges 126 Chapter 5. Topologies and LJ parameters. However, the free energy difference between the A and B states is not the total hydration free energy. One has to add the free energy for reintroducing the internal Coulomb and LJ interactions in the solute when in vacuum. This second step can be combined with the first step when the Coulomb and LJ interactions within the solute are not modified. For this purpose, there is a pairs function type 2, which is identical to function type 1, except that the B-state parameters are always identical to the A-state parameters. For searching the parameters in the [ pairtypes ] section, no distinction is made between function type 1 and 2. The pairs section [ pairs_nb ] is intended to replace the non-bonded interaction. It uses the unscaled charges and the non-bonded LJ parameters; it also only uses the A-state parameters. Note that one should add exclusions for all atom pairs listed in [ pairs_nb ], otherwise such pairs will also end up in the normal neighbor lists. Alternatively, this same behavior can be achieved without ever touching the topology, by using the couple-moltype, couple-lambda0, couple-lambda1, and couple-intramol keywords. See sections sec. 3.12 and sec. 6.1 for more information. All three pair types always use plain Coulomb interactions, even when Reaction-field, PME, Ewald or shifted Coulomb interactions are selected for the non-bonded interactions. Energies for types 1 and 2 are written to the energy and log file in separate “LJ-14” and “Coulomb-14” entries per energy group pair. Energies for [ pairs_nb ] are added to the “LJ-(SR)” and “Coulomb(SR)” terms. 5.4.4 Exclusions The exclusions for non-bonded interactions are generated by grompp for neighboring atoms up to a certain number of bonds away, as defined in the [ moleculetype ] section in the topology file (see 5.8.1). Particles are considered bonded when they are connected by “chemical” bonds ([ bonds ] types 1 to 5, 7 or 8) or constraints ([ constraints ] type 1). Type 5 [ bonds ] can be used to create a connection between two atoms without creating an interaction. There is a harmonic interaction ([ bonds ] type 6) that does not connect the atoms by a chemical bond. There is also a second constraint type ([ constraints ] type 2) that fixes the distance, but does not connect the atoms by a chemical bond. For a complete list of all these interactions, see Table 5.5. Extra exclusions within a molecule can be added manually in a [ exclusions ] section. Each line should start with one atom index, followed by one or more atom indices. All non-bonded interactions between the first atom and the other atoms will be excluded. When all non-bonded interactions within or between groups of atoms need to be excluded, is it more convenient and much more efficient to use energy monitor group exclusions (see sec. 3.3). 5.5 Implicit solvation parameters Starting with GROMACS 4.5, implicit solvent is supported. A section in the topology has been introduced to list those parameters: [ implicit_genborn_params ] 5.6. Constraint algorithms ; Atomtype NH1 N H CT1 sar 0.155 0.155 0.1 0.180 127 st 1 1 1 1 pi 1.028 1 1 1.276 gbr 0.17063 0.155 0.115 0.190 hct 0.79 0.79 0.85 0.72 ; ; ; ; N Proline backbone N H C In this example the atom type is listed first, followed by five numbers, and a comment (following a semicolon). Values in columns 1-3 are not currently used. They pertain to more elaborate surface area algorithms, the one from Qiu et al. [71] in particular. Column 4 contains the atomic van der Waals radii, which are used in computing the Born radii. The dielectric offset is specified in the *.mdp file, and gets subtracted from the input van der Waals radii for the different Born radii methods, as described by Onufriev et al. [73]. Column 5 is the scale factor for the HCT and OBC models. The values are taken from the Tinker implementation of the HCT pairwise scaling method [72]. This method has been modified such that the scaling factors have been adjusted to minimize differences between analytical surface areas and GB using the HCT algorithm. The scaling is further modified in that it is not applied pairwise as proposed by Hawkins et al. [72], but on a per-atom (rather than a per-pair) basis. 5.6 Constraint algorithms Constraints are defined in the [ constraints ] section. The format is two atom numbers followed by the function type, which can be 1 or 2, and the constraint distance. The only difference between the two types is that type 1 is used for generating exclusions and type 2 is not (see sec. 5.4.4). The distances are constrained using the LINCS or the SHAKE algorithm, which can be selected in the *.mdp file. Both types of constraints can be perturbed in free-energy calculations by adding a second constraint distance (see 5.8.5). Several types of bonds and angles (see Table 5.5) can be converted automatically to constraints by grompp. There are several options for this in the *.mdp file. We have also implemented the SETTLE algorithm [47], which is an analytical solution of SHAKE, specifically for water. SETTLE can be selected in the topology file. See, for instance, the SPC molecule definition: [ moleculetype ] ; molname nrexcl SOL 1 [ atoms ] ; nr at type res nr 1 OW 1 2 HW 1 3 HW 1 ren nm SOL SOL SOL [ settles ] ; OW funct 1 1 dhh 0.16333 doh 0.1 at nm OW1 HW2 HW3 cg nr 1 1 1 charge -0.82 0.41 0.41 128 Chapter 5. Topologies [ exclusions ] 1 2 2 1 3 1 3 3 2 The [ settles ] directive defines the first atom of the water molecule. The settle funct is always 1, and the distance between O-H and H-H distances must be given. Note that the algorithm can also be used for TIP3P and TIP4P [133]. TIP3P just has another geometry. TIP4P has a virtual site, but since that is generated it does not need to be shaken (nor stirred). 5.7 pdb2gmx input files The GROMACS program pdb2gmx generates a topology for the input coordinate file. Several formats are supported for that coordinate file, but *.pdb is the most commonly-used format (hence the name pdb2gmx). pdb2gmx searches for force fields in sub-directories of the GROMACS share/top directory and your working directory. Force fields are recognized from the file forcefield.itp in a directory with the extension .ff. The file forcefield.doc may be present, and if so, its first line will be used by pdb2gmx to present a short description to the user to help in choosing a force field. Otherwise, the user can choose a force field with the -ff xxx command-line argument to pdb2gmx, which indicates that a force field in a xxx.ff directory is desired. pdb2gmx will search first in the working directory, then in the GROMACS share/top directory, and use the first matching xxx.ff directory found. Two general files are read by pdb2gmx: an atom type file (extension .atp, see 5.2.1) from the force-field directory, and a file called residuetypes.dat from either the working directory, or the GROMACS share/top directory. residuetypes.dat determines which residue names are considered protein, DNA, RNA, water, and ions. pdb2gmx can read one or multiple databases with topological information for different types of molecules. A set of files belonging to one database should have the same basename, preferably telling something about the type of molecules (e.g. aminoacids, rna, dna). The possible files are: • .rtp • .r2b (optional) • .arn (optional) • .hdb (optional) • .n.tdb (optional) • .c.tdb (optional) Only the .rtp file, which contains the topologies of the building blocks, is mandatory. Information from other files will only be used for building blocks that come from an .rtp file with the same base name. The user can add building blocks to a force field by having additional files 5.7. pdb2gmx input files 129 with the same base name in their working directory. By default, only extra building blocks can be defined, but calling pdb2gmx with the -rtpo option will allow building blocks in a local file to replace the default ones in the force field. 5.7.1 Residue database The files holding the residue databases have the extension .rtp. Originally this file contained building blocks (amino acids) for proteins, and is the GROMACS interpretation of the rt37c4.dat file of GROMOS. So the residue database file contains information (bonds, charges, charge groups, and improper dihedrals) for a frequently-used building block. It is better not to change this file because it is standard input for pdb2gmx, but if changes are needed make them in the *.top file (see 5.8.1), or in a .rtp file in the working directory as explained in sec. 5.7. Defining topologies of new small molecules is probably easier by writing an include topology file *.itp directly. This will be discussed in section 5.8.2. When adding a new protein residue to the database, don’t forget to add the residue name to the residuetypes.dat file, so that grompp, make_ndx and analysis tools can recognize the residue as a protein residue (see 8.1.1). The .rtp files are only used by pdb2gmx. As mentioned before, the only extra information this program needs from the .rtp database is bonds, charges of atoms, charge groups, and improper dihedrals, because the rest is read from the coordinate input file. Some proteins contain residues that are not standard, but are listed in the coordinate file. You have to construct a building block for this “strange” residue, otherwise you will not obtain a *.top file. This also holds for molecules in the coordinate file such as ligands, polyatomic ions, crystallization co-solvents, etc. The residue database is constructed in the following way: [ bondedtypes ] ; bonds angles 1 1 [ GLY ] ; mandatory dihedrals impropers 1 2 ; mandatory ; mandatory [ atoms ] ; ; name type N N H H CA CH2 C C O O mandatory charge chargegroup -0.280 0 0.280 0 0.000 1 0.380 2 -0.380 2 [ bonds ] ; optional ;atom1 atom2 b0 N H N CA CA C C O -C N [ exclusions ] ;atom1 atom2 kb ; optional 130 Chapter 5. Topologies [ angles ] ; optional ;atom1 atom2 atom3 th0 cth [ dihedrals ] ; optional ;atom1 atom2 atom3 atom4 phi0 [ impropers ] ; optional ;atom1 atom2 atom3 atom4 N -C CA H -C -CA N -O q0 cp mult cq [ ZN ] [ atoms ] ZN ZN 2.000 0 The file is free format; the only restriction is that there can be at most one entry on a line. The first field in the file is the [ bondedtypes ] field, which is followed by four numbers, indicating the interaction type for bonds, angles, dihedrals, and improper dihedrals. The file contains residue entries, which consist of atoms and (optionally) bonds, angles, dihedrals, and impropers. The charge group codes denote the charge group numbers. Atoms in the same charge group should always be ordered consecutively. When using the hydrogen database with pdb2gmx for adding missing hydrogens (see 5.7.4), the atom names defined in the .rtp entry should correspond exactly to the naming convention used in the hydrogen database. The atom names in the bonded interaction can be preceded by a minus or a plus, indicating that the atom is in the preceding or following residue respectively. Explicit parameters added to bonds, angles, dihedrals, and impropers override the standard parameters in the .itp files. This should only be used in special cases. Instead of parameters, a string can be added for each bonded interaction. This is used in GROMOS-96 .rtp files. These strings are copied to the topology file and can be replaced by force-field parameters by the C-preprocessor in grompp using #define statements. pdb2gmx automatically generates all angles. This means that for most force fields the [ angles ] field is only useful for overriding .itp parameters. For the GROMOS-96 force field the interaction number of all angles needs to be specified. pdb2gmx automatically generates one proper dihedral for every rotatable bond, preferably on heavy atoms. When the [ dihedrals ] field is used, no other dihedrals will be generated for the bonds corresponding to the specified dihedrals. It is possible to put more than one dihedral function on a rotatable bond. In the case of CHARMM27 FF pdb2gmx can add correction maps to the dihedrals using the default -cmap option. Please refer to 4.10.4 for more information. pdb2gmx sets the number of exclusions to 3, which means that interactions between atoms connected by at most 3 bonds are excluded. Pair interactions are generated for all pairs of atoms that are separated by 3 bonds (except pairs of hydrogens). When more interactions need to be excluded, or some pair interactions should not be generated, an [ exclusions ] field can be added, followed by pairs of atom names on separate lines. All non-bonded and pair interactions between these atoms will be excluded. 5.7. pdb2gmx input files ARG ARGN ASP ASPH CYS CYS2 GLU GLUH HISD HISE HISH HIS1 LYSN LYS HEME 131 protonated arginine neutral arginine negatively charged aspartic acid neutral aspartic acid neutral cysteine cysteine with sulfur bound to another cysteine or a heme negatively charged glutamic acid neutral glutamic acid neutral histidine with Nδ protonated neutral histidine with N protonated positive histidine with both Nδ and N protonated histidine bound to a heme neutral lysine protonated lysine heme Table 5.3: Internal GROMACS residue naming convention. 5.7.2 Residue to building block database Each force field has its own naming convention for residues. Most residues have consistent naming, but some, especially those with different protonation states, can have many different names. The .r2b files are used to convert standard residue names to the force-field build block names. If no .r2b is present in the force-field directory or a residue is not listed, the building block name is assumed to be identical to the residue name. The .r2b can contain 2 or 5 columns. The 2-column format has the residue name in the first column and the building block name in the second. The 5-column format has 3 additional columns with the building block for the residue occurring in the N-terminus, C-terminus and both termini at the same time (single residue molecule). This is useful for, for instance, the AMBER force fields. If one or more of the terminal versions are not present, a dash should be entered in the corresponding column. There is a GROMACS naming convention for residues which is only apparent (except for the pdb2gmx code) through the .r2b file and specbond.dat files. This convention is only of importance when you are adding residue types to an .rtp file. The convention is listed in Table 5.3. For special bonds with, for instance, a heme group, the GROMACS naming convention is introduced through specbond.dat (see 5.7.7), which can subsequently be translated by the .r2b file, if required. 5.7.3 Atom renaming database Force fields often use atom names that do not follow IUPAC or PDB convention. The .arn database is used to translate the atom names in the coordinate file to the force-field names. Atoms that are not listed keep their names. The file has three columns: the building block name, the old atom name, and the new atom name, respectively. The residue name supports question-mark wildcards that match a single character. 132 Chapter 5. Topologies An additional general atom renaming file called xlateat.dat is present in the share/top directory, which translates common non-standard atom names in the coordinate file to IUPAC/PDB convention. Thus, when writing force-field files, you can assume standard atom names and no further atom name translation is required, except for translating from standard atom names to the force-field ones. 5.7.4 Hydrogen database The hydrogen database is stored in .hdb files. It contains information for the pdb2gmx program on how to connect hydrogen atoms to existing atoms. In versions of the database before GROMACS 3.3, hydrogen atoms were named after the atom they are connected to: the first letter of the atom name was replaced by an ‘H.’ In the versions from 3.3 onwards, the H atom has to be listed explicitly, because the old behavior was protein-specific and hence could not be generalized to other molecules. If more than one hydrogen atom is connected to the same atom, a number will be added to the end of the hydrogen atom name. For example, adding two hydrogen atoms to ND2 (in asparagine), the hydrogen atoms will be named HD21 and HD22. This is important since atom naming in the .rtp file (see 5.7.1) must be the same. The format of the hydrogen database is as follows: ; res ALA ARG # additions # H add type 1 1 1 4 1 2 1 1 2 3 2 3 H i j k H N -C CA H HE HH1 HH2 N NE NH1 NH2 CA CD CZ CZ C CZ NE NE On the first line we see the residue name (ALA or ARG) and the number of kinds of hydrogen atoms that may be added to this residue by the hydrogen database. After that follows one line for each addition, on which we see: • The number of H atoms added • The method for adding H atoms, which can be any of: 1 one planar hydrogen, e.g. rings or peptide bond One hydrogen atom (n) is generated, lying in the plane of atoms (i,j,k) on the plane bisecting angle (j-i-k) at a distance of 0.1 nm from atom i, such that the angles (n-i-j) and (n-i-k) are > 90o . 2 one single hydrogen, e.g. hydroxyl One hydrogen atom (n) is generated at a distance of 0.1 nm from atom i, such that angle (n-i-j)=109.5 degrees and dihedral (n-i-j-k)=trans. 3 two planar hydrogens, e.g. ethylene -C=CH2 , or amide -C(=O)NH2 Two hydrogens (n1,n2) are generated at a distance of 0.1 nm from atom i, such that 5.7. pdb2gmx input files 133 angle (n1-i-j)=(n2-i-j)=120 degrees and dihedral (n1-i-j-k)=cis and (n2-i-j-k)=trans, such that names are according to IUPAC standards [134]. 4 two or three tetrahedral hydrogens, e.g. -CH3 Three (n1,n2,n3) or two (n1,n2) hydrogens are generated at a distance of 0.1 nm from atom i, such that angle (n1-i-j)=(n2-i-j)=(n3-i-j)=109.47o , dihedral (n1-i-j-k)=trans, (n2-i-j-k)=trans+120 and (n3-i-j-k)=trans+240o . 5 one tetrahedral hydrogen, e.g. C3 CH One hydrogen atom (n0 ) is generated at a distance of 0.1 nm from atom i in tetrahedral conformation such that angle (n0 -i-j)=(n0 -i-k)=(n0 -i-l)=109.47o . 6 two tetrahedral hydrogens, e.g. C-CH2 -C Two hydrogen atoms (n1,n2) are generated at a distance of 0.1 nm from atom i in tetrahedral conformation on the plane bisecting angle j-i-k with angle (n1-i-n2)=(n1-ij)=(n1-i-k)=109.47o . 7 two water hydrogens Two hydrogens are generated around atom i according to SPC [85] water geometry. The symmetry axis will alternate between three coordinate axes in both directions. 10 three water “hydrogens” Two hydrogens are generated around atom i according to SPC [85] water geometry. The symmetry axis will alternate between three coordinate axes in both directions. In addition, an extra particle is generated on the position of the oxygen with the first letter of the name replaced by ‘M’. This is for use with four-atom water models such as TIP4P [133]. 11 four water “hydrogens” Same as above, except that two additional particles are generated on the position of the oxygen, with names ‘LP1’ and ‘LP2.’ This is for use with five-atom water models such as TIP5P [135]. • The name of the new H atom (or its prefix, e.g. HD2 for the asparagine example given earlier). • Three or four control atoms (i,j,k,l), where the first always is the atom to which the H atoms are connected. The other two or three depend on the code selected. For water, there is only one control atom. Some more exotic cases can be approximately constructed from the above tools, and with suitable use of energy minimization are good enough for beginning MD simulations. For example secondary amine hydrogen, nitrenyl hydrogen (C=NH) and even ethynyl hydrogen could be approximately constructed using method 2 above for hydroxyl hydrogen. 5.7.5 Termini database The termini databases are stored in aminoacids.n.tdb and aminoacids.c.tdb for the Nand C-termini respectively. They contain information for the pdb2gmx program on how to connect new atoms to existing ones, which atoms should be removed or changed, and which bonded interactions should be added. Their format is as follows (from gromos43a1.ff/aminoacids.c.tdb): 134 Chapter 5. Topologies [ None ] [ COO- ] [ replace ] C C C 12.011 0.27 O O1 OM 15.9994 -0.635 OXT O2 OM 15.9994 -0.635 [ add ] 2 8 O C CA N OM 15.9994 -0.635 [ bonds ] C O1 gb_5 C O2 gb_5 [ angles ] O1 C O2 ga_37 CA C O1 ga_21 CA C O2 ga_21 [ dihedrals ] N CA C O2 gd_20 [ impropers ] C CA O2 O1 gi_1 The file is organized in blocks, each with a header specifying the name of the block. These blocks correspond to different types of termini that can be added to a molecule. In this example [ COO- ] is the first block, corresponding to changing the terminal carbon atom into a deprotonated carboxyl group. [ None ] is the second terminus type, corresponding to a terminus that leaves the molecule as it is. Block names cannot be any of the following: replace, add, delete, bonds, angles, dihedrals, impropers. Doing so would interfere with the parameters of the block, and would probably also be very confusing to human readers. For each block the following options are present: • [ replace ] Replace an existing atom by one with a different atom type, atom name, charge, and/or mass. This entry can be used to replace an atom that is present both in the input coordinates and in the .rtp database, but also to only rename an atom in the input coordinates such that it matches the name in the force field. In the latter case, there should also be a corresponding [ add ] section present that gives instructions to add the same atom, such that the position in the sequence and the bonding is known. Such an atom can be present in the input coordinates and kept, or not present and constructed by pdb2gmx. For each atom to be replaced on line should be entered with the following fields: – name of the atom to be replaced – new atom name (optional) – new atom type – new mass – new charge 5.7. pdb2gmx input files 135 • [ add ] Add new atoms. For each (group of) added atom(s), a two-line entry is necessary. The first line contains the same fields as an entry in the hydrogen database (name of the new atom, number of atoms, type of addition, control atoms, see 5.7.4), but the possible types of addition are extended by two more, specifically for C-terminal additions: 8 two carboxyl oxygens, -COO− Two oxygens (n1,n2) are generated according to rule 3, at a distance of 0.136 nm from atom i and an angle (n1-i-j)=(n2-i-j)=117 degrees 9 carboxyl oxygens and hydrogen, -COOH Two oxygens (n1,n2) are generated according to rule 3, at distances of 0.123 nm and 0.125 nm from atom i for n1 and n2, respectively, and angles (n1-i-j)=121 and (n2-ij)=115 degrees. One hydrogen (n0 ) is generated around n2 according to rule 2, where n-i-j and n-i-j-k should be read as n0 -n2-i and n0 -n2-i-j, respectively. After this line, another line follows that specifies the details of the added atom(s), in the same way as for replacing atoms, i.e.: – atom type – mass – charge – charge group (optional) Like in the hydrogen database (see 5.7.1), when more than one atom is connected to an existing one, a number will be appended to the end of the atom name. Note that, like in the hydrogen database, the atom name is now on the same line as the control atoms, whereas it was at the beginning of the second line prior to GROMACS version 3.3. When the charge group field is left out, the added atom will have the same charge group number as the atom that it is bonded to. • [ delete ] Delete existing atoms. One atom name per line. • [ bonds ], [ angles ], [ dihedrals ] and [ impropers ] Add additional bonded parameters. The format is identical to that used in the *.rtp file, see 5.7.1. 5.7.6 Virtual site database Since we cannot rely on the positions of hydrogens in input files, we need a special input file to decide the geometries and parameters with which to add virtual site hydrogens. For more complex virtual site constructs (e.g. when entire aromatic side chains are made rigid) we also need information about the equilibrium bond lengths and angles for all atoms in the side chain. This information is specified in the .vsd file for each force field. Just as for the termini, there is one such file for each class of residues in the .rtp file. The virtual site database is not really a very simple list of information. The first couple of sections specify which mass centers (typically called MCH3 /MNH3 ) to use for CH3 , NH3 , and NH2 groups. 136 Chapter 5. Topologies Depending on the equilibrium bond lengths and angles between the hydrogens and heavy atoms we need to apply slightly different constraint distances between these mass centers. Note that we do not have to specify the actual parameters (that is automatic), just the type of mass center to use. To accomplish this, there are three sections names [ CH3 ], [ NH3 ], and [ NH2 ]. For each of these we expect three columns. The first column is the atom type bound to the 2/3 hydrogens, the second column is the next heavy atom type which this is bound, and the third column the type of mass center to use. As a special case, in the [ NH2 ] section it is also possible to specify planar in the second column, which will use a different construction without mass center. There are currently different opinions in some force fields whether an NH2 group should be planar or not, but we try hard to stick to the default equilibrium parameters of the force field. The second part of the virtual site database contains explicit equilibrium bond lengths and angles for pairs/triplets of atoms in aromatic side chains. These entries are currently read by specific routines in the virtual site generation code, so if you would like to extend it e.g. to nucleic acids you would also need to write new code there. These sections are named after the short amino acid names ([ PHE ], [ TYR ], [ TRP ], [ HID ], [ HIE ], [ HIP ]), and simply contain 2 or 3 columns with atom names, followed by a number specifying the bond length (in nm) or angle (in degrees). Note that these are approximations of the equilibrated geometry for the entire molecule, which might not be identical to the equilibrium value for a single bond/angle if the molecule is strained. 5.7.7 Special bonds The primary mechanism used by pdb2gmx to generate inter-residue bonds relies on head-totail linking of backbone atoms in different residues to build a macromolecule. In some cases (e.g. disulfide bonds, a heme group, branched polymers), it is necessary to create inter-residue bonds that do not lie on the backbone. The file specbond.dat takes care of this function. It is necessary that the residues belong to the same [ moleculetype ]. The -merge and -chainsep functions of pdb2gmx can be useful when managing special inter-residue bonds between different chains. The first line of specbond.dat indicates the number of entries that are in the file. If you add a new entry, be sure to increment this number. The remaining lines in the file provide the specifications for creating bonds. The format of the lines is as follows: resA atomA nbondsA resB atomB nbondsB length newresA newresB The columns indicate: 1. resA The name of residue A that participates in the bond. 2. atomA The name of the atom in residue A that forms the bond. 3. nbondsA The total number of bonds atomA can form. 4. resB The name of residue B that participates in the bond. 5. atomB The name of the atom in residue B that forms the bond. 6. nbondsB The total number of bonds atomB can form. 5.8. File formats 137 7. length The reference length for the bond. If atomA and atomB are not within length ± 10% in the coordinate file supplied to pdb2gmx, no bond will be formed. 8. newresA The new name of residue A, if necessary. Some force fields use e.g. CYS2 for a cysteine in a disulfide or heme linkage. 9. newresB The new name of residue B, likewise. 5.8 File formats 5.8.1 Topology file The topology file is built following the GROMACS specification for a molecular topology. A *.top file can be generated by pdb2gmx. All possible entries in the topology file are listed in Tables 5.4 and 5.5. Also tabulated are: all the units of the parameters, which interactions can be perturbed for free energy calculations, which bonded interactions are used by grompp for generating exclusions, and which bonded interactions can be converted to constraints by grompp. 138 Chapter 5. Topologies Parameters interaction type mandatory directive # at. f. tp parameters F. E. defaults mandatory atomtypes LJ Buckingham bondtypes pairtypes angletypes dihedraltypes(∗) constrainttypes nonbond_params nonbond_params non-bonded function type; combination rule(cr) ; generate pairs (no/yes); fudge LJ (); fudge QQ () atom type; m (u); q (e); particle type; V(cr) ; W(cr) (see Table 5.5, directive bonds) (see Table 5.5, directive pairs) (see Table 5.5, directive angles) (see Table 5.5, directive dihedrals) (see Table 5.5, directive constraints) 2 1 V (cr) ; W (cr) 2 2 a (kJ mol−1 ); b (nm−1 ); c6 (kJ mol−1 nm6 ) Molecule definition(s) mandatory mandatory moleculetype atoms (nrexcl) 1 molecule name; nex atom type; residue number; residue name; atom name; charge group number; q (e); m (u) type q, m intra-molecular interaction and geometry definitions as described in Table 5.5 System mandatory mandatory system molecules system name molecule name; number of molecules Inter-molecular interactions optional intermolecular_interactions one or more bonded interactions as described in Table 5.5, with two or more atoms, no interactions that generate exclusions, no constraints, use global atom numbers ‘# at’ is the required number of atom type indices for this directive ‘f. tp’ is the value used to select this function type ‘F. E.’ indicates which of the parameters can be interpolated in free energy calculations (cr) the combination rule determines the type of LJ parameters, see 5.3.2 (∗) for dihedraltypes one can specify 4 atoms or the inner (outer for improper) 2 atoms (nrexcl) exclude neighbors n bonds away for non-bonded interactions ex For free energy calculations, type, q and m or no parameters should be added for topology ‘B’ (λ = 1) on the same line, after the normal parameters. Table 5.4: The topology (*.top) file. Name of interaction Topology file directive bond G96 bond Morse cubic bond connection harmonic potential FENE bond tabulated bond tabulated bondk restraint potential extra LJ or Coulomb extra LJ or Coulomb extra LJ or Coulomb angle G96 angle cross bond-bond cross bond-angle Urey-Bradley bonds§ ,¶ bonds§ ,¶ bonds§ ,¶ bonds§ ,¶ bonds§ bonds bonds§ bonds§ bonds bonds pairs pairs pairs_nb angles¶ angles¶ angles angles angles¶ quartic angle angles¶ num. atoms∗ 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 func. type† 1 2 3 4 5 6 7 8 9 10 1 2 1 1 2 3 4 5 3 6 Order of parameters and their units b0 (nm); kb (kJ mol−1 nm−2 ) b0 (nm); kb (kJ mol−1 nm−4 ) b0 (nm); D (kJ mol−1 ); β (nm−1 ) b0 (nm); Ci=2,3 (kJ mol−1 nm−i ) b0 (nm); kb (kJ mol−1 nm−2 ) bm (nm); kb (kJ mol−1 nm−2 ) table number (≥ 0); k (kJ mol−1 ) table number (≥ 0); k (kJ mol−1 ) low, up1 , up2 (nm); kdr (kJ mol−1 nm−2 ) V ∗∗ ; W ∗∗ fudge QQ (); qi , qj (e), V ∗∗ ; W ∗∗ qi , qj (e); V ∗∗ ; W ∗∗ θ0 (deg); kθ (kJ mol−1 rad−2 ) θ0 (deg); kθ (kJ mol−1 ) r1e , r2e (nm); krr0 (kJ mol−1 nm−2 ) r1e , r2e r3e (nm); krθ (kJ mol−1 nm−2 ) θ0 (deg); kθ (kJ mol−1 rad−2 ); r13 (nm); kU B (kJ mol−1 nm−2 ) θ0 (deg); Ci=0,1,2,3,4 (kJ mol−1 rad−i ) use in F.E.?‡ all all all all k k all all all all all Crossreferences 4.2.1 4.2.1 4.2.2 4.2.3 5.4.4 4.2.1,5.4.4 4.2.4 4.2.14 4.2.14,5.4.4 4.3.5 5.4.3 5.4.3 5.4.3 4.2.5 4.2.6 4.2.9 4.2.10 4.2.8 5.8. File formats Table 5.5: Details of [ moleculetype ] directives 4.2.11 ∗ The required number of atom indices for this directive The index to use to select this function type ‡ Indicates which of the parameters can be interpolated in free energy calculations § This interaction type will be used by grompp for generating exclusions ¶ This interaction type can be converted to constraints by grompp ∗∗ The combination rule determines the type of LJ parameters, see 5.3.2 k No connection, and so no exclusions, are generated for this interaction † 139 140 Table 5.5: Details of [ moleculetype ] directives Topology file directive num. atoms∗ 3 3 4 4 4 4 4 4 4 4 4 func. type† 8 10 1 2 3 4 5 8 9 10 11 tabulated angle restricted bending potential proper dihedral improper dihedral Ryckaert-Bellemans dihedral periodic improper dihedral Fourier dihedral tabulated dihedral proper dihedral (multiple) restricted dihedral combined bending-torsion potential exclusions constraint constraintk SETTLE 2-body virtual site 3-body virtual site 3-body virtual site (fd) 3-body virtual site (fad) 3-body virtual site (out) 4-body virtual site (fdn) N-body virtual site (COG) N-body virtual site (COM) N-body virtual site (COW) angles angles dihedrals dihedrals dihedrals dihedrals dihedrals dihedrals dihedrals dihedrals dihedrals exclusions constraints§ constraints settles virtual_sites2 virtual_sites3 virtual_sites3 virtual_sites3 virtual_sites3 virtual_sites4 virtual_sitesn virtual_sitesn virtual_sitesn 1 2 2 1 3 4 4 4 4 5 1 1 1 1 2 1 1 1 2 3 4 2 1 2 3 position restraint position_restraints 1 1 Order of parameters and their units table number (≥ 0); k (kJ mol−1 ) θ0 (deg); kθ (kJ mol−1 ) φs (deg); kφ (kJ mol−1 ); multiplicity ξ0 (deg); kξ (kJ mol−1 rad−2 ) C0 , C1 , C2 , C3 , C4 , C5 (kJ mol−1 ) φs (deg); kφ (kJ mol−1 ); multiplicity C1 , C2 , C3 , C4 (kJ mol−1 ) table number (≥ 0); k (kJ mol−1 ) φs (deg); kφ (kJ mol−1 ); multiplicity φ0 (deg); kφ (kJ mol−1 ) a0 , a1 , a2 , a3 , a4 (kJ mol−1 ) one or more atom indices b0 (nm) b0 (nm) dOH , dHH (nm) a () a, b () a (); d (nm) θ (deg); d (nm) a, b (); c (nm−1 ) a, b (); c (nm) one or more constructing atom indices one or more constructing atom indices one or more pairs consisting of constructing atom index and weight kx , ky , kz (kJ mol−1 nm−2 ) use in F.E.?‡ k φ, k all all φ, k all k φ, k all all all Crossreferences 4.2.14 4.2.7 4.2.13 4.2.12 4.2.13 4.2.12 4.2.13 4.2.14 4.2.13 4.2.13 4.2.13 5.4.4 4.5,5.6 4.5,5.6,5.4.4 3.6.1,5.6 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.7 4.3.1 Chapter 5. Topologies Name of interaction Name of interaction num. atoms∗ 1 func. type† 2 dihedral_restraints 2 4 1 1 orientation restraint orientation_restraints 2 1 angle restraint angle restraint (z) angle_restraints 4 2 1 1 flat-bottomed position straint distance restraint dihedral restraint Topology file directive re- position_restraints distance_restraints angle_restraints_z Order of parameters and their units use in F.E.?‡ g, r (nm), k (kJ mol−1 nm−2 ) type; label; low, up1 , up2 (nm); weight () φ0 (deg); ∆φ (deg); kdihr (kJ mol−1 rad−2 ) exp.; label; α; c (U nmα ); obs. (U); weight (U−1 ) θ0 (deg); kc (kJ mol−1 ); multiplicity θ0 (deg); kc (kJ mol−1 ); multiplicity all Crossreferences 4.3.2 5.8. File formats Table 5.5: Details of [ moleculetype ] directives 4.3.5 4.3.4 4.3.6 θ, k θ, k 4.3.3 4.3.3 141 142 Chapter 5. Topologies Description of the file layout: • Semicolon (;) and newline characters surround comments • On a line ending with \ the newline character is ignored. • Directives are surrounded by [ and ] • The topology hierarchy (which must be followed) consists of three levels: – the parameter level, which defines certain force-field specifications (see Table 5.4) – the molecule level, which should contain one or more molecule definitions (see Table 5.5) – the system level, containing only system-specific information ([ system ] and [ molecules ]) • Items should be separated by spaces or tabs, not commas • Atoms in molecules should be numbered consecutively starting at 1 • Atoms in the same charge group must be listed consecutively • The file is parsed only once, which implies that no forward references can be treated: items must be defined before they can be used • Exclusions can be generated from the bonds or overridden manually • The bonded force types can be generated from the atom types or overridden per bond • It is possible to apply multiple bonded interactions of the same type on the same atoms • Descriptive comment lines and empty lines are highly recommended • Starting with GROMACS version 3.1.3, all directives at the parameter level can be used multiple times and there are no restrictions on the order, except that an atom type needs to be defined before it can be used in other parameter definitions • If parameters for a certain interaction are defined multiple times for the same combination of atom types the last definition is used; starting with GROMACS version 3.1.3 grompp generates a warning for parameter redefinitions with different values • Using one of the [ atoms ], [ bonds ], [ pairs ], [ angles ], etc. without having used [ moleculetype ] before is meaningless and generates a warning • Using [ molecules ] without having used [ system ] before is meaningless and generates a warning. • After [ system ] the only allowed directive is [ molecules ] • Using an unknown string in [ ] causes all the data until the next directive to be ignored and generates a warning 5.8. File formats 143 Here is an example of a topology file, urea.top: ; ; Example topology file ; ; The force-field files to be included #include "amber99.ff/forcefield.itp" [ moleculetype ] ; name nrexcl Urea 3 [ atoms ] 1 C 1 2 O 1 3 N 1 4 H 1 5 H 1 6 N 1 7 H 1 8 H 1 URE URE URE URE URE URE URE URE C O N1 H11 H12 N2 H21 H22 1 2 3 4 5 6 7 8 0.880229 -0.613359 -0.923545 0.395055 0.395055 -0.923545 0.395055 0.395055 12.01000 16.00000 14.01000 1.00800 1.00800 14.01000 1.00800 1.00800 ; ; ; ; ; ; ; ; amber amber amber amber amber amber amber amber [ bonds ] 1 2 1 3 1 6 3 4 3 5 6 7 6 8 [ dihedrals ] ; ai aj 2 1 2 1 2 1 2 1 3 1 3 1 6 1 6 1 ak 3 3 6 6 6 6 3 3 [ dihedrals ] 3 6 1 4 1 7 1 3 6 [ ; ; ; al funct 4 9 5 9 7 9 8 9 7 9 8 9 4 9 5 9 2 5 8 definition 4 4 4 position_restraints ] you wouldn’t normally use this for a molecule like Urea, but we include it here for didactic purposes ai funct fc 1 1 1000 1000 1000 ; Restrain to a point C O N H H N H H type type type type type type type type 144 Chapter 5. Topologies 2 3 1 1 1000 1000 0 0 [ dihedral_restraints ] ; ai aj ak al type 3 6 1 2 1 1 4 3 5 1 1000 ; Restrain to a line (Y-axis) 0 ; Restrain to a plane (Y-Z-plane) phi 180 180 dphi 0 0 fc 10 10 ; Include TIP3P water topology #include "amber99/tip3p.itp" [ system ] Urea in Water [ molecules ] ;molecule name Urea SOL nr. 1 1000 Here follows the explanatory text. #include "amber99.ff/forcefield.itp" : this includes the information for the force field you are using, including bonded and non-bonded parameters. This example uses the AMBER99 force field, but your simulation may use a different force field. grompp will automatically go and find this file and copy-and-paste its content. That content can be seen in share/top/amber99.ff/forcefield.itp, and it is #define _FF_AMBER #define _FF_AMBER99 [ defaults ] ; nbfunc 1 comb-rule 2 gen-pairs yes fudgeLJ fudgeQQ 0.5 0.8333 #include "ffnonbonded.itp" #include "ffbonded.itp" #include "gbsa.itp" The two #define statements set up the conditions so that future parts of the topology can know that the AMBER 99 force field is in use. [ defaults ] : • nbfunc is the non-bonded function type. Use 1 (Lennard-Jones) or 2 (Buckingham) • comb-rule is the number of the combination rule (see 5.3.2). • gen-pairs is for pair generation. The default is ‘no’, i.e. get 1-4 parameters from the pairtypes list. When parameters are not present in the list, stop with a fatal error. Setting ‘yes’ generates 1-4 parameters that are not present in the pair list from normal LennardJones parameters using fudgeLJ 5.8. File formats 145 • fudgeLJ is the factor by which to multiply Lennard-Jones 1-4 interactions, default 1 • fudgeQQ is the factor by which to multiply electrostatic 1-4 interactions, default 1 • N is the power for the repulsion term in a 6-N potential (with nonbonded-type LennardJones only), starting with GROMACS version 4.5, mdrun also reads and applies N , for values not equal to 12 tabulated interaction functions are used (in older version you would have to use user tabulated interactions). Note that gen-pairs, fudgeLJ, fudgeQQ, and N are optional. fudgeLJ is only used when generate pairs is set to ‘yes’, and fudgeQQ is always used. However, if you want to specify N you need to give a value for the other parameters as well. Then some other #include statements add in the large amount of data needed to describe the rest of the force field. We will skip these and return to urea.top. There we will see [ moleculetype ] : defines the name of your molecule in this *.top and nrexcl = 3 stands for excluding non-bonded interactions between atoms that are no further than 3 bonds away. [ atoms ] : defines the molecule, where nr and type are fixed, the rest is user defined. So atom can be named as you like, cgnr made larger or smaller (if possible, the total charge of a charge group should be zero), and charges can be changed here too. [ bonds ] : no comment. [ pairs ] : LJ and Coulomb 1-4 interactions [ angles ] : no comment [ dihedrals ] : in this case there are 9 proper dihedrals (funct = 1), 3 improper (funct = 4) and no Ryckaert-Bellemans type dihedrals. If you want to include Ryckaert-Bellemans type dihedrals in a topology, do the following (in case of e.g. decane): [ dihedrals ] ; ai aj 1 2 2 3 ak 3 4 al funct 4 3 5 3 c0 c1 c2 In the original implementation of the potential for alkanes [136] no 1-4 interactions were used, which means that in order to implement that particular force field you need to remove the 1-4 interactions from the [ pairs ] section of your topology. In most modern force fields, like OPLS/AA or Amber the rules are different, and the Ryckaert-Bellemans potential is used as a cosine series in combination with 1-4 interactions. [ position_restraints ] : harmonically restrain the selected particles to reference positions (4.3.1). The reference positions are read from a separate coordinate file by grompp. [ dihedral_restraints ] : restrain selected dihedrals to a reference value. The implementation of dihedral restraints is described in section 4.3.4 of the manual. The parameters specified in the [dihedral_restraints] directive are as follows: • type has only one possible value which is 1 • phi is the value of φ0 in eqn. 4.83 and eqn. 4.84 of the manual. 146 Chapter 5. Topologies • dphi is the value of ∆φ in eqn. 4.84 of the manual. • fc is the force constant kdihr in eqn. 4.84 of the manual. #include "tip3p.itp" : includes a topology file that was already constructed (see section 5.8.2). [ system ] : title of your system, user-defined [ molecules ] : this defines the total number of (sub)molecules in your system that are defined in this *.top. In this example file, it stands for 1 urea molecule dissolved in 1000 water molecules. The molecule type SOL is defined in the tip3p.itp file. Each name here must correspond to a name given with [ moleculetype ] earlier in the topology. The order of the blocks of molecule types and the numbers of such molecules must match the coordinate file that accompanies the topology when supplied to grompp. The blocks of molecules do not need to be contiguous, but some tools (e.g. genion) may act only on the first or last such block of a particular molecule type. Also, these blocks have nothing to do with the definition of groups (see sec. 3.3 and sec. 8.1). 5.8.2 Molecule.itp file If you construct a topology file you will use frequently (like the water molecule, tip3p.itp, which is already constructed for you) it is good to make a molecule.itp file. This only lists the information of one particular molecule and allows you to re-use the [ moleculetype ] in multiple systems without re-invoking pdb2gmx or manually copying and pasting. An example urea.itp follows: [ moleculetype ] ; molname nrexcl URE 3 [ atoms ] 1 C 1 ... 8 H 1 URE C 1 0.880229 12.01000 ; amber C type URE H22 8 0.395055 1.00800 ; amber H type [ bonds ] 1 2 ... 6 8 [ dihedrals ] ; ai aj 2 1 ... 6 1 [ dihedrals ] 3 6 1 4 1 7 ak 3 al funct 4 9 3 5 9 1 3 6 2 5 8 4 4 4 definition 5.8. File formats 147 Using *.itp files results in a very short *.top file: ; ; Example topology file ; ; The force field files to be included #include "amber99.ff/forcefield.itp" #include "urea.itp" ; Include TIP3P water topology #include "amber99/tip3p.itp" [ system ] Urea in Water [ molecules ] ;molecule name Urea SOL 5.8.3 nr. 1 1000 Ifdef statements A very powerful feature in GROMACS is the use of #ifdef statements in your *.top file. By making use of this statement, and associated #define statements like were seen in amber99.ff/forcefield.itp earlier, different parameters for one molecule can be used in the same *.top file. An example is given for TFE, where there is an option to use different charges on the atoms: charges derived by De Loof et al. [137] or by Van Buuren and Berendsen [138]. In fact, you can use much of the functionality of the C preprocessor, cpp, because grompp contains similar pre-processing functions to scan the file. The way to make use of the #ifdef option is as follows: • either use the option define = -DDeLoof in the *.mdp file (containing grompp input parameters), or use the line #define DeLoof early in your *.top or *.itp file; and • put the #ifdef statements in your *.top, as shown below: ... [ atoms ] ; nr type resnr #ifdef DeLoof ; Use Charges from DeLoof 1 C 1 2 F 1 3 F 1 4 F 1 residu atom cgnr charge TFE TFE TFE TFE C F F F 1 1 1 1 0.74 -0.25 -0.25 -0.25 mass 148 Chapter 5. Topologies 5 CH2 1 6 OA 1 7 HO 1 #else ; Use Charges from VanBuuren 1 C 1 2 F 1 3 F 1 4 F 1 5 CH2 1 6 OA 1 7 HO 1 #endif TFE TFE TFE CH2 OA HO 1 1 1 0.25 -0.65 0.41 TFE TFE TFE TFE TFE TFE TFE C F F F CH2 OA HO 1 1 1 1 1 1 1 0.59 -0.2 -0.2 -0.2 0.26 -0.55 0.3 [ bonds ] ; ai aj funct c0 c1 6 7 1 1.000000e-01 3.138000e+05 1 2 1 1.360000e-01 4.184000e+05 1 3 1 1.360000e-01 4.184000e+05 1 4 1 1.360000e-01 4.184000e+05 1 5 1 1.530000e-01 3.347000e+05 5 6 1 1.430000e-01 3.347000e+05 ... This mechanism is used by pdb2gmx to implement optional position restraints (4.3.1) by #includeing an .itp file whose contents will be meaningful only if a particular #define is set (and spelled correctly!) 5.8.4 Topologies for free energy calculations Free energy differences between two systems, A and B, can be calculated as described in sec. 3.12. Systems A and B are described by topologies consisting of the same number of molecules with the same number of atoms. Masses and non-bonded interactions can be perturbed by adding B parameters under the [ atoms ] directive. Bonded interactions can be perturbed by adding B parameters to the bonded types or the bonded interactions. The parameters that can be perturbed are listed in Tables 5.4 and 5.5. The λ-dependence of the interactions is described in section sec. 4.5. The bonded parameters that are used (on the line of the bonded interaction definition, or the ones looked up on atom types in the bonded type lists) is explained in Table 5.6. In most cases, things should work intuitively. When the A and B atom types in a bonded interaction are not all identical and parameters are not present for the B-state, either on the line or in the bonded types, grompp uses the A-state parameters and issues a warning. For free energy calculations, all or no parameters for topology B (λ = 1) should be added on the same line, after the normal parameters, in the same order as the normal parameters. From GROMACS 4.6 onward, if λ is treated as a vector, then the bonded-lambdas component controls all bonded terms that are not explicitly labeled as restraints. Restrain terms are controlled by the restraint-lambdas component. Below is an example of a topology which changes from 200 propanols to 200 pentanes using the GROMOS-96 force field. 5.8. File formats 149 B-state atom types all identical to A-state atom types yes no parameters on line A B +AB − +A +B − − − − − − +AB − +A +B − − − − − − − − − − parameters in bonded types A atom types B atom types A B A B x x x x − − +AB − +A +B x x x x x x x x − − x x +AB − − − +A +B − − +A x +B − +A x + +B message error warning error warning warning Table 5.6: The bonded parameters that are used for free energy topologies, on the line of the bonded interaction definition or looked up in the bond types section based on atom types. A and B indicate the parameters used for state A and B respectively, + and − indicate the (non-)presence of parameters in the topology, x indicates that the presence has no influence. ; Include force field parameters #include "gromos43a1.ff/forcefield.itp" [ moleculetype ] ; Name PropPent nrexcl 3 [ atoms ] ; nr type resnr residue atom cgnr 1 H 1 PROP PH 1 2 OA 1 PROP PO 1 3 CH2 1 PROP PC1 1 4 CH2 1 PROP PC2 2 5 CH3 1 PROP PC3 2 [ bonds ] ; ai aj funct 1 2 2 2 3 2 3 4 2 4 5 2 [ pairs ] ; ai aj funct 1 4 1 2 5 1 par_A gb_1 gb_17 gb_26 gb_26 par_B gb_26 gb_26 gb_26 charge 0.398 -0.548 0.150 0.000 0.000 mass typeB chargeB massB 1.008 CH3 0.0 15.035 15.9994 CH2 0.0 14.027 14.027 CH2 0.0 14.027 14.027 15.035 150 Chapter 5. Topologies [ angles ] ; ai aj 1 2 2 3 3 4 ak funct 3 2 4 2 5 2 [ dihedrals ] ; ai aj 1 2 2 3 ak 3 4 par_A ga_11 ga_14 ga_14 al funct 4 1 5 1 par_B ga_14 ga_14 ga_14 par_A gd_12 gd_17 par_B gd_17 gd_17 [ system ] ; Name Propanol to Pentane [ molecules ] ; Compound PropPent #mols 200 Atoms that are not perturbed, PC2 and PC3, do not need B-state parameter specifications, since the B parameters will be copied from the A parameters. Bonded interactions between atoms that are not perturbed do not need B parameter specifications, as is the case for the last bond in the example topology. Topologies using the OPLS/AA force field need no bonded parameters at all, since both the A and B parameters are determined by the atom types. Non-bonded interactions involving one or two perturbed atoms use the free-energy perturbation functional forms. Non-bonded interactions between two non-perturbed atoms use the normal functional forms. This means that when, for instance, only the charge of a particle is perturbed, its Lennard-Jones interactions will also be affected when lambda is not equal to zero or one. Note that this topology uses the GROMOS-96 force field, in which the bonded interactions are not determined by the atom types. The bonded interaction strings are converted by the C-preprocessor. The force-field parameter files contain lines like: #define gb_26 #define gd_17 5.8.5 0.1530 0.000 7.1500e+06 5.86 3 Constraint forces The constraint force between two atoms in one molecule can be calculated with the free energy perturbation code by adding a constraint between the two atoms, with a different length in the A and B topology. When the B length is 1 nm longer than the A length and lambda is kept constant at zero, the derivative of the Hamiltonian with respect to lambda is the constraint force. For constraints between molecules, the pull code can be used, see sec. 6.4. Below is an example for calculating the constraint force at 0.7 nm between two methanes in water, by combining the two methanes into one “molecule.” Note that the definition of a “molecule” in GROMACS does not necessarily correspond to the chemical definition of a molecule. In GROMACS, a “molecule” can 5.8. File formats 151 be defined as any group of atoms that one wishes to consider simultaneously. The added constraint is of function type 2, which means that it is not used for generating exclusions (see sec. 5.4.4). Note that the constraint free energy term is included in the derivative term, and is specifically included in the bonded-lambdas component. However, the free energy for changing constraints is not included in the potential energy differences used for BAR and MBAR, as this requires reevaluating the energy at each of the constraint components. This functionality is planned for later versions. ; Include force-field parameters #include "gromos43a1.ff/forcefield.itp" [ moleculetype ] ; Name Methanes nrexcl 1 [ atoms ] ; nr type resnr residu 1 CH4 1 CH4 2 CH4 1 CH4 [ constraints ] ; ai aj funct length_A 1 2 2 0.7 atom C1 C2 cgnr 1 2 charge 0 0 mass 16.043 16.043 length_B 1.7 #include "gromos43a1.ff/spc.itp" [ system ] ; Name Methanes in Water [ molecules ] ; Compound Methanes SOL 5.8.6 #mols 1 2002 Coordinate file Files with the .gro file extension contain a molecular structure in GROMOS-87 format. A sample piece is included below: MD of 2 waters, reformat step, 6 1WATER OW1 1 0.126 1WATER HW2 2 0.190 1WATER HW3 3 0.177 2WATER OW1 4 1.275 2WATER HW2 5 1.337 2WATER HW3 6 1.326 1.82060 1.82060 1.82060 PA aug-91 1.624 1.661 1.568 0.053 0.002 0.120 1.679 0.1227 -0.0580 0.0434 1.747 0.8085 0.3191 -0.7791 1.613 -0.9045 -2.6469 1.3180 0.622 0.2519 0.3140 -0.1734 0.680 -1.0641 -1.1349 0.0257 0.568 1.9427 -0.8216 -0.0244 152 Chapter 5. Topologies This format is fixed, i.e. all columns are in a fixed position. If you want to read such a file in your own program without using the GROMACS libraries you can use the following formats: C-format: "%5i%5s%5s%5i%8.3f%8.3f%8.3f%8.4f%8.4f%8.4f" Or to be more precise, with title etc. it looks like this: "%s\n", Title "%5d\n", natoms for (i=0; (i .ff directories in the $GMXLIB/share/gromacs/top sub-directory and/or the working directory. The information regarding the location of the force field files is printed by pdb2gmx so you can easily keep track of which version of a force field is being called, in case you have made modifications in one location or another. The force fields included with GROMACS are: • AMBER03 protein, nucleic AMBER94 (Duan et al., J. Comp. Chem. 24, 1999-2012, 2003) • AMBER94 force field (Cornell et al., JACS 117, 5179-5197, 1995) • AMBER96 protein, nucleic AMBER94 (Kollman et al., Acc. Chem. Res. 29, 461-469, 1996) • AMBER99 protein, nucleic AMBER94 (Wang et al., J. Comp. Chem. 21, 1049-1074, 2000) • AMBER99SB protein, nucleic AMBER94 (Hornak et al., Proteins 65, 712-725, 2006) • AMBER99SB-ILDN protein, nucleic AMBER94 (Lindorff-Larsen et al., Proteins 78, 1950-58, 2010) • AMBERGS force field (Garcia & Sanbonmatsu, PNAS 99, 2782-2787, 2002) • CHARMM27 all-atom force field (CHARM22 plus CMAP for proteins) • GROMOS96 43a1 force field • GROMOS96 43a2 force field (improved alkane dihedrals) • GROMOS96 45a3 force field (Schuler JCC 2001 22 1205) 5.9. Force field organization 153 • GROMOS96 53a5 force field (JCC 2004 vol 25 pag 1656) • GROMOS96 53a6 force field (JCC 2004 vol 25 pag 1656) • GROMOS96 54a7 force field (Eur. Biophys. J. (2011), 40„ 843-856, DOI: 10.1007/s00249-0110700-9) • OPLS-AA/L all-atom force field (2001 aminoacid dihedrals) A force field is included at the beginning of a topology file with an #include statement followed by .ff/forcefield.itp. This statement includes the force-field file, which, in turn, may include other force-field files. All the force fields are organized in the same way. An example of the amber99.ff/forcefield.itp was shown in 5.8.1. For each force field, there several files which are only used by pdb2gmx. These are: residue databases (.rtp, see 5.7.1) the hydrogen database (.hdb, see 5.7.4), two termini databases (.n.tdb and .c.tdb, see 5.7.5) and the atom type database (.atp, see 5.2.1), which contains only the masses. Other optional files are described in sec. 5.7. 5.9.2 Changing force-field parameters If one wants to change the parameters of few bonded interactions in a molecule, this is most easily accomplished by typing the parameters behind the definition of the bonded interaction directly in the *.top file under the [ moleculetype ] section (see 5.8.1 for the format and units). If one wants to change the parameters for all instances of a certain interaction one can change them in the force-field file or add a new [ ???types ] section after including the force field. When parameters for a certain interaction are defined multiple times, the last definition is used. As of GROMACS version 3.1.3, a warning is generated when parameters are redefined with a different value. Changing the Lennard-Jones parameters of an atom type is not recommended, because in the GROMOS force fields the Lennard-Jones parameters for several combinations of atom types are not generated according to the standard combination rules. Such combinations (and possibly others that do follow the combination rules) are defined in the [ nonbond_params ] section, and changing the Lennard-Jones parameters of an atom type has no effect on these combinations. 5.9.3 Adding atom types As of GROMACS version 3.1.3, atom types can be added in an extra [ atomtypes ] section after the the inclusion of the normal force field. After the definition of the new atom type(s), additional non-bonded and pair parameters can be defined. In pre-3.1.3 versions of GROMACS, the new atom types needed to be added in the [ atomtypes ] section of the force-field files, because all non-bonded parameters above the last [ atomtypes ] section would be overwritten using the standard combination rules. 154 Chapter 5. Topologies Chapter 6 Special Topics 6.1 Free energy implementation For free energy calculations, there are two things that must be specified; the end states, and the pathway connecting the end states. The end states can be specified in two ways. The most straightforward is through the specification of end states in the topology file. Most potential forms support both an A state and a B state. Whenever both states are specified, then the A state corresponds to the initial free energy state, and the B state corresponds to the final state. In some cases, the end state can also be defined in some cases without altering the topology, solely through the .mdp file, through the use of the couple-moltype,couple-lambda0, couple-lambda1, and couple-intramol mdp keywords. Any molecule type selected in couple-moltype will automatically have a B state implicitly constructed (and the A state redefined) according to the couple-lambda keywords. couple-lambda0 and couple-lambda1 define the non-bonded parameters that are present in the A state (couple-lambda0) and the B state (couple-lambda1). The choices are ’q’,’vdw’, and ’vdw-q’; these indicate the Coulombic, van der Waals, or both parameters that are turned on in the respective state. Once the end states are defined, then the path between the end states has to be defined. This path is defined solely in the .mdp file. Starting in 4.6, λ is a vector of components, with Coulombic, van der Waals, bonded, restraint, and mass components all able to be adjusted independently. This makes it possible to turn off the Coulombic term linearly, and then the van der Waals using soft core, all in the same simulation. This is especially useful for replica exchange or expanded ensemble simulations, where it is important to sample all the way from interacting to non-interacting states in the same simulation to improve sampling. fep-lambdas is the default array of λ values ranging from 0 to 1. All of the other lambda arrays use the values in this array if they are not specified. The previous behavior, where the pathway is controlled by a single λ variable, can be preserved by using only fep-lambdas to define the pathway. For example, if you wanted to first to change the Coulombic terms, then the van der Waals terms, changing bonded at the same time rate as the van der Waals, but changing the restraints throughout 156 Chapter 6. Special Topics the first two-thirds of the simulation, then you could use this λ vector: coul-lambdas vdw-lambdas bonded-lambdas restraint-lambdas = = = = 0.0 0.0 0.0 0.0 0.2 0.0 0.0 0.0 0.5 0.0 0.0 0.1 1.0 0.0 0.0 0.2 1.0 0.4 0.4 0.3 1.0 0.5 0.5 0.5 1.0 0.6 0.6 0.7 1.0 0.7 0.7 1.0 1.0 0.8 0.8 1.0 1.0 1.0 1.0 1.0 This is also equivalent to: fep-lambdas coul-lambdas restraint-lambdas = 0.0 0.0 0.0 0.0 0.4 0.5 0.6 0.7 0.8 1.0 = 0.0 0.2 0.5 1.0 1.0 1.0 1.0 1.0 1.0 1.0 = 0.0 0.0 0.1 0.2 0.3 0.5 0.7 1.0 1.0 1.0 The fep-lambda array, in this case, is being used as the default to fill in the bonded and van der Waals λ arrays. Usually, it’s best to fill in all arrays explicitly, just to make sure things are properly assigned. If you want to turn on only restraints going from A to B, then it would be: restraint-lambdas = 0.0 0.1 0.2 0.4 0.6 1.0 and all of the other components of the λ vector would be left in the A state. To compute free energies with a vector λ using thermodynamic integration, then the TI equation becomes vector equation: Z ∆F = h∇Hi · d~λ (6.1) or for finite differences: ∆F ≈ Z X h∇Hi · ∆λ (6.2) The external pymbar script downloaded from https://SimTK.org/home/pymbar can compute this integral automatically from the GROMACS dhdl.xvg output. 6.2 Potential of mean force A potential of mean force (PMF) is a potential that is obtained by integrating the mean force from an ensemble of configurations. In GROMACS, there are several different methods to calculate the mean force. Each method has its limitations, which are listed below. • pull code: between the centers of mass of molecules or groups of molecules. • AWH code: currently acts on coordinates provided by the pull code. • free-energy code with harmonic bonds or constraints: between single atoms. • free-energy code with position restraints: changing the conformation of a relatively immobile group of atoms. 6.3. Non-equilibrium pulling 157 • pull code in limited cases: between groups of atoms that are part of a larger molecule for which the bonds are constrained with SHAKE or LINCS. If the pull group if relatively large, the pull code can be used. The pull and free-energy code a described in more detail in the following two sections. Entropic effects When a distance between two atoms or the centers of mass of two groups is constrained or restrained, there will be a purely entropic contribution to the PMF due to the rotation of the two groups [139]. For a system of two non-interacting masses the potential of mean force is: Vpmf (r) = −(nc − 1)kB T log(r) (6.3) where nc is the number of dimensions in which the constraint works (i.e. nc = 3 for a normal constraint and nc = 1 when only the z-direction is constrained). Whether one needs to correct for this contribution depends on what the PMF should represent. When one wants to pull a substrate into a protein, this entropic term indeed contributes to the work to get the substrate into the protein. But when calculating a PMF between two solutes in a solvent, for the purpose of simulating without solvent, the entropic contribution should be removed. Note that this term can be significant; when at 300K the distance is halved, the contribution is 3.5 kJ mol−1 . 6.3 Non-equilibrium pulling When the distance between two groups is changed continuously, work is applied to the system, which means that the system is no longer in equilibrium. Although in the limit of very slow pulling the system is again in equilibrium, for many systems this limit is not reachable within reasonable computational time. However, one can use the Jarzynski relation [140] to obtain the equilibrium free-energy difference ∆G between two distances from many non-equilibrium simulations: D ∆GAB = −kB T log e−βWAB E A (6.4) where WAB is the work performed to force the system along one path from state A to B, the angular bracket denotes averaging over a canonical ensemble of the initial state A and β = 1/kB T . 6.4 The pull code The pull code applies forces or constraints between the centers of mass of one or more pairs of groups of atoms. Each pull reaction coordinate is called a “coordinate” and it operates on usually two, but sometimes more, pull groups. A pull group can be part of one or more pull coordinates. Furthermore, a coordinate can also operate on a single group and an absolute reference position in space. The distance between a pair of groups can be determined in 1, 2 or 3 dimensions, or can be along a user-defined vector. The reference distance can be constant or can change linearly with time. Normally all atoms are weighted by their mass, but an additional weighting factor can also be used. 158 Chapter 6. Special Topics z link Vrup z spring Figure 6.1: Schematic picture of pulling a lipid out of a lipid bilayer with umbrella pulling. Vrup is the velocity at which the spring is retracted, Zlink is the atom to which the spring is attached and Zspring is the location of the spring. Several different pull types, i.e. ways to apply the pull force, are supported, and in all cases the reference distance can be constant or linearly changing with time. 1. Umbrella pulling A harmonic potential is applied between the centers of mass of two groups. Thus, the force is proportional to the displacement. 2. Constraint pulling The distance between the centers of mass of two groups is constrained. The constraint force can be written to a file. This method uses the SHAKE algorithm but only needs 1 iteration to be exact if only two groups are constrained. 3. Constant force pulling A constant force is applied between the centers of mass of two groups. Thus, the potential is linear. In this case there is no reference distance of pull rate. 4. Flat bottom pulling Like umbrella pulling, but the potential and force are zero for coordinate values below (pull-coord?-type = flat-bottom) or above (pull-coord?-type = flat-bottom-high) a reference value. This is useful for restraining e.g. the distance between two molecules to a certain region. In addition, there are different types of reaction coordinates, so-called pull geometries. These are set with the .mdp option pull-coord?-geometry. Definition of the center of mass In GROMACS, there are three ways to define the center of mass of a group. The standard way is a “plain” center of mass, possibly with additional weighting factors. With periodic boundary conditions it is no longer possible to uniquely define the center of mass of a group of atoms. Therefore, a reference atom is used. For determining the center of mass, for all other atoms in the group, the closest periodic image to the reference atom is used. This uniquely defines the center of mass. By default, the middle (determined by the order in the topology) atom is used as a reference atom, but the user can also select any other atom if it would be closer to center of the group. 6.4. The pull code 159 dc dc c Figure 6.2: Comparison of a plain center of mass reference group versus a cylinder reference group applied to interface systems. C is the reference group. The circles represent the center of mass of two groups plus the reference group, dc is the reference distance. For a layered system, for instance a lipid bilayer, it may be of interest to calculate the PMF of a lipid as function of its distance from the whole bilayer. The whole bilayer can be taken as reference group in that case, but it might also be of interest to define the reaction coordinate for the PMF more locally. The .mdp option pull-coord?-geometry = cylinder does not use all the atoms of the reference group, but instead dynamically only those within a cylinder with radius pull-cylinder-r around the pull vector going through the pull group. This only works for distances defined in one dimension, and the cylinder is oriented with its long axis along this one dimension. To avoid jumps in the pull force, contributions of atoms are weighted as a function of distance (in addition to the mass weighting): w(r < rcyl ) = 1 − 2 w(r ≥ rcyl ) = 0 r rcyl !2 + r rcyl !4 (6.5) (6.6) Note that the radial dependence on the weight causes a radial force on both cylinder group and the other pull group. This is an undesirable, but unavoidable effect. To minimize this effect, the cylinder radius should be chosen sufficiently large. The effective mass is 0.47 times that of a cylinder with uniform weights and equal to the mass of uniform cylinder of 0.79 times the radius. For a group of molecules in a periodic system, a plain reference group might not be well-defined. An example is a water slab that is connected periodically in x and y, but has two liquid-vapor interfaces along z. In such a setup, water molecules can evaporate from the liquid and they will move through the vapor, through the periodic boundary, to the other interface. Such a system is inherently periodic and there is no proper way of defining a “plain” center of mass along z. A proper solution is to using a cosine shaped weighting profile for all atoms in the reference group. The profile is a cosine with a single period in the unit cell. Its phase is optimized to give the maximum sum of weights, including mass weighting. This provides a unique and continuous reference position that is nearly identical to the plain center of mass position in case all atoms are 160 Chapter 6. Special Topics all within a half of the unit-cell length. See ref [141] for details. When relative weights wi are used during the calculations, either by supplying weights in the input or due to cylinder geometry or due to cosine weighting, the weights need to be scaled to conserve momentum: , wi0 = wi N X wj mj j=1 N X wj2 mj (6.7) j=1 where mj is the mass of atom j of the group. The mass of the group, required for calculating the constraint force, is: M= N X wi0 mi (6.8) i=1 The definition of the weighted center of mass is: r com = N X , wi0 mi r i M (6.9) i=1 From the centers of mass the AFM, constraint, or umbrella force F com on each group can be calculated. The force on the center of mass of a group is redistributed to the atoms as follows: Fi = wi0 mi F com M (6.10) Definition of the pull direction The most common setup is to pull along the direction of the vector containing the two pull groups, this is selected with pull-coord?-geometry = distance. You might want to pull along a certain vector instead, which is selected with pull-coord?-geometry = direction. But this can cause unwanted torque forces in the system, unless you pull against a reference group with (nearly) fixed orientation, e.g. a membrane protein embedded in a membrane along x/y while pulling along z. If your reference group does not have a fixed orientation, you should probably use pull-coord?-geometry = direction-relative, see Fig. 6.3. Since the potential now depends on the coordinates of two additional groups defining the orientation, the torque forces will work on these two groups. Definition of the angle and dihedral pull geometries Four pull groups are required for pull-coord?-geometry = angle. In the same way as for geometries with two groups, each consecutive pair of groups i and i + 1 define a vector connecting the COMs of groups i and i + 1. The angle is defined as the angle between the two resulting vectors. E.g., the .mdp option pull-coord?-groups = 1 2 2 4 defines the angle between the vector from the COM of group 1 to the COM of group 2 and the vector from the COM of group 2 to the COM of group 4. The angle takes values in the closed interval [0, 180] deg. For pull-coord?-geometry = angle-axis the angle is defined with respect to a reference axis given by pull-coord?-vec and only two groups need to be given. The dihedral geometry requires six pull groups. These pair up in the same way as described above and 6.5. Adaptive biasing with AWH 161 3 2 dp 1 4 Figure 6.3: The pull setup for geometry direction-relative. The “normal” pull groups are 1 and 2. Groups 3 and 4 define the pull direction and thus the direction of the normal pull forces (red). This leads to reaction forces (blue) on groups 3 and 4, which are perpendicular to the pull direction. Their magnitude is given by the “normal” pull force times the ratio of dp and the distance between groups 3 and 4. so define three vectors. The dihedral angle is defined as the angle between the two planes spanned by the two first and the two last vectors. Equivalently, the dihedral angle can be seen as the angle between the first and the third vector when these vectors are projected onto a plane normal to the second vector (the axis vector). As an example, consider a dihedral angle involving four groups: 1, 5, 8 and 9. Here, the .mdp option pull-coord?-groups = 8 1 1 5 5 9 specifies the three vectors that define the dihedral angle: the first vector is the COM distance vector from group 8 to 1, the second vector is the COM distance vector from group 1 to 5, and the third vector is the COM distance vector from group 5 to 9. The dihedral angle takes values in the interval (-180, 180] deg and has periodic boundaries. Limitations There is one theoretical limitation: strictly speaking, constraint forces can only be calculated between groups that are not connected by constraints to the rest of the system. If a group contains part of a molecule of which the bond lengths are constrained, the pull constraint and LINCS or SHAKE bond constraint algorithms should be iterated simultaneously. This is not done in GROMACS. This means that for simulations with constraints = all-bonds in the .mdp file pulling is, strictly speaking, limited to whole molecules or groups of molecules. In some cases this limitation can be avoided by using the free energy code, see sec. 6.9. In practice, the errors caused by not iterating the two constraint algorithms can be negligible when the pull group consists of a large amount of atoms and/or the pull force is small. In such cases, the constraint correction displacement of the pull group is small compared to the bond lengths. 6.5 Adaptive biasing with AWH The accelerated weight histogram method (AWH) [142] calculates the PMF along a reaction coordinate by adding an adaptively determined biasing potential. AWH flattens free energy barriers along the reaction coordinate by applying a history-dependent potential to the system that “fills up” free energy minima. This is similar in spirit to other adaptive biasing potential methods, e.g. the Wang-Landau [143], local elevation [144] and metadynamics [145] methods. The initial sam- 162 Chapter 6. Special Topics pling stage of AWH makes the method robust against the choice of input parameters. Furthermore, the target distribution along the reaction coordinate may be chosen freely. 6.5.1 Basics of the method Rather than biasing the reaction coordinate ξ(x) directly, AWH acts on a reference coordinate λ. The reaction coordinate ξ(x) is coupled to λ with a harmonic potential 1 Q(ξ, λ) = βk(ξ − λ)2 , 2 (6.11) so that for large force constants k, ξ ≈ λ. Note the use of dimensionless energies for compatibility with previously published work. Units of energy are obtained by multiplication with kB T = 1/β. In the simulation, λ samples the user-defined sampling interval I. For a multidimensional reaction coordinate ξ, the sampling interval is the Cartesian product I = Πµ Iµ (a rectangular domain). The connection between atom coordinates and λ is established through the extended ensemble [68], P (x, λ) = 1 g(λ)−Q(ξ(x),λ)−V (x) e , Z (6.12) where g(λ) is a bias function (a free variable) and V (x) is the unbiased potential energy of the system. The distribution along λ can be tuned to be any predefined target distribution ρ(λ) (often chosen to be flat) by choosing g(λ) wisely. This is evident from P (λ) = Z P (x, λ)dx = 1 g(λ) e Z Z e−Q(ξ(x),λ)−V (x) dx ≡ 1 g(λ)−F (λ) e , Z (6.13) where F (λ) is the free energy F (λ) = − ln Z e−Q(ξ(x),λ)−V (x) dx. (6.14) Being the convolution of the PMF with the Gaussian defined by the harmonic potential, F (λ) is a smoothened version of the PMF. Eq. 6.13 shows that in order to obtain P (λ) = ρ(λ), F (λ) needs to be determined accurately. Thus, AWH adaptively calculates F (λ) and simultaneously converges P (λ) toward ρ(λ). The free energy update AWH is initialized with an estimate of the free energy F0 (λ). At regular time intervals this estimate is updated using data collected in between the updates. At update n, the applied bias gn (λ) is a function of the current free energy estimate Fn (λ) and target distribution ρn (λ), gn (λ) = ln ρn (λ) + Fn (λ), (6.15) which is consistent with Eq. 6.13. Note that also the target distribution may be updated during the simulation (see examples in section 6.5.3). Substituting this choice of g = gn back into Eq. 6.13 yields the simple free energy update ∆Fn (λ) = F (λ) − Fn (λ) = − ln Pn (λ) , ρn (λ) (6.16) 6.5. Adaptive biasing with AWH 163 which would yield a better estimate Fn+1 = Fn + ∆Fn , assuming Pn (λ) can be measured accurately. AWH estimates Pn (λ) by regularly calculating the conditional distribution egn (λ)−Q(ξ(x),λ) . gn (λ0 )−Q(ξ(x),λ0 ) λ0 e ωn (λ|x) ≡ Pn (λ|x) = P (6.17) R Accumulating these probability weights yields t ω(λ|x(t)) ∼ Pn (λ), where Pn (λ|x)Pn (x)dx = Pn (λ) has been used. The ωn (λ|x) weights are thus the samples of the AWH method. With the limited amount of sampling one has in practice, update scheme 6.16 yields very noisy results. AWH instead applies a free energy update that has the same form but which can be applied repeatedly with limited and localized sampling, P Wn (λ) + t ωn (λ|x(t)) P ∆Fn = − ln . Wn (λ) + t ρn (λ)) P (6.18) Here Wn (λ) is the reference weight histogram representing prior sampling. The update for W (λ), disregarding the initial stage (see section 6.5.2), is Wn+1 (λ) = Wn (λ) + X ρn (λ). (6.19) t Thus, the weight histogram equals the targeted, “ideal” history of samples. There are two important things to note about the free energy update. First, sampling is driven away from oversampled, currently local regions. For such λ values, ωn (λ) > ρn (λ) and ∆Fn (λ) < 0, which by Eq. 6.15 implies ∆gn (λ) < 0 (assuming ∆ρn ≡ 0). Thus, the probability to sample λ decreases after the P update (see Eq. 6.13). Secondly, the normalization of the histogram Nn = λ Wn (λ), determines the update size |∆F (λ)|. For instance, for a single√sample ω(λ|x), the shape of the update is approximately a Gaussian function of width σ = 1/ βk and height ∝ 1/Nn [142], 1 − 1 βk(ξ(x)−λ)2 |∆Fn (λ)| ∝ e 2 . (6.20) Nn Therefore, as samples accumulate in W (λ) and Nn grows, the updates get smaller, allowing for the free energy to converge. Note that quantity of interest to the user is not F (λ) but the PMF Φ(ξ). Φ(ξ) is extracted by reweighting samples ξ(t) on the fly [142] (see also section 6.5.5) and will converge at the same rate as F (λ), see Fig. 6.4. The PMF will be written to output (see section 6.5.7). Applying the bias to the system The bias potential can be applied to the system in two ways. Either by applying a harmonic potential centered at λ(t), which is sampled using (rejection-free) Monte-Carlo sampling from the conditional distribution ωn (λ|x(t)) = Pn (λ|x(t)), see Eq. 6.17. This is also called Gibbs sampling or independence sampling. Alternatively, and by default in the code, the following convolved bias potential can be applied, Un (ξ) = − ln Z egn (λ)−Q(ξ,λ) dλ. (6.21) These two approaches are equivalent in the sense that they give rise to the same biased probabilities Pn (x) (cf. 6.12) while the dynamics are clearly different in the two cases. This choice does not affect the internals of the AWH algorithm, only what force and potential AWH returns to the MD engine. 164 Chapter 6. Special Topics Sampling interval Final stage Time ∼ 1/t 0 ln(1/γ) PMF Φ(ξ) Log of sample weight, lns 0 Time Time Exact PMF 1st covering 2nd 3rd lns(t) slope ∝ ln[(N + ∆N)/N] 1/(N0 γm ) 10 kB T Initial stage 0 1/N(t) Update size 1/N Reaction coordinate ξ ξ(t) Reaction coordinate ξ Figure 6.4: AWH evolution in time for a Brownian particle in a double-well potential. The reaction coordinate ξ(t) traverses the sampling interval multiple times in the initial stage before exiting and entering the final stage (top left). In the final stage, the dynamics of ξ becomes increasingly diffusive. The times of covering are shown as ×-markers of different colors. At these times the free energy update size ∼ 1/N , where N is the size of the weight histogram, is decreased by scaling N by a factor of γ = 3 (top right). In the final stage, N grows at the sampling rate and thus 1/N ∼ 1/t. The exit from the final stage is determined on the fly by ensuring that the effective sample weight s of data collected in the final stage exceeds that of initial stage data (bottom left; note that ln s(t) is plotted). An estimate of the PMF is also extracted from the simulation (bottom right), which after exiting the initial stage should estimate global free energy differences fairly accurately. 6.5. Adaptive biasing with AWH 6.5.2 165 The initial stage Initially, when the bias potential is far from optimal, samples will be highly correlated. In such cases, letting W (λ) accumulate samples as prescribed by Eq. 6.19, entails a too rapid decay of the free energy update size. This motivates splitting the simulation into an initial stage where the weight histogram grows according to a more restrictive and robust protocol, and a final stage where the the weight histogram grows linearly at the sampling rate (Eq. 6.19). The AWH initial stage takes inspiration from the well-known Wang-Landau algorithm [143], although there are differences in the details. In the initial stage the update size is kept constant (by keeping Nn constant) until a transition across the sampling interval has been detected, a “covering”. For the definition of a covering, see Eq. 6.22 below. After a covering has occurred, Nn is scaled up by a constant “growth factor” γ, chosen heuristically as γ = 3. Thus, in the initial stage Nn is set dynamically as Nn = γ m N0 , where m is the number of coverings. Since the update size scales as 1/N ( Eq. 6.20) this leads to a close to exponential decay of the update size in the initial stage, see Fig. 6.4. The update size directly determines the rate of change of Fn (λ) and hence, from Eq. 6.15, also the rate of change of the bias funcion gn (λ) Thus initially, when Nn is kept small and updates large, the system will be driven along the reaction coordinate by the constantly fluctuating bias. If N0 is set small enough, the first transition will typically be fast because of the large update size and will quickly give a first rough estimate of the free energy. The second transition, using N1 = γN0 refines this estimate further. Thus, rather than very carefully filling free energy minima using a small initial update size, the sampling interval is sweeped back-and-forth multiple times, using a wide range of update sizes, see Fig. 6.4. This way, the initial stage also makes AWH robust against the choice of N0 . The covering criterion In the general case of a multidimensional reaction coordinate λ = (λµ ), the sampling interval I is considered covered when all dimensions have been covered. A dimension d is covered if all points λµ in the one-dimensional sampling interval Iµ have been “visited”. Finally, a point λµ ∈ Iµ has been visited if there is at least one point λ∗ ∈ I with λ∗µ = λµ that since the last covering has accumulated probability weight corresponding to the peak of a multidimensional Gaussian distribution Y ∆λµ √ ∆W (λ∗ ) ≥ wpeak ≡ . (6.22) 2πσk µ p Here, ∆λµ is the point spacing of the discretized Iµ and σk = 1/ βkµ (where kµ is the force constant) is the Gaussian width. Exit from the initial stage For longer times, when major free energy barriers have largely been flattened by the converging bias potential, the histogram W (λ) should grow at the actual sampling rate and the initial stage needs to be exited [146]. There are multiple reasonable (heuristic) ways of determining when this transition should take place. One option is to postulate that the number of samples in the 166 Chapter 6. Special Topics weight histogram Nn should never exceed the actual number of collected samples, and exit the initial stage when this condition breaks [142]. In the initial stage, N grows close to exponentially while the collected number of samples grows linearly, so an exit will surely occur eventually. Here we instead apply an exit criterion based on the observation that “artifically” keeping N constant while continuing to collect samples corresponds to scaling down the relative weight of old samples relative to new ones. Similarly, the subsequent scaling up of N by a factor γ corresponds to scaling up the weight of old data. Briefly, the exit criterion is devised such that the weight of a sample collected after the initial stage is always larger or equal to the weight of a sample collected during the initial stage, see Fig. 6.4. This is consistent with scaling down early, noisy data. The initial stage exit criterion will now be described in detail. We start out at the beginning of a covering stage, so that N has just been scaled by γ and is now kept constant. Thus, the first sample of this stage has the weight s = 1/γ relative to the last sample of the previous covering stage. We assume that ∆N samples are collected and added to W for each update . To keep N constant, W needs to be scaled down by a factor N/(N + ∆N ) after every update. Equivalently, this means that new data is scaled up relative to old data by the inverse factor. Thus, after ∆n updates a new sample has the relative weight s = (1/γ)[(Nn + ∆N )/Nn ]∆n . Now assume covering occurs at this time. To continue to the next covering stage, N should be scaled by γ, which corresponds to again multiplying s by 1/γ. If at this point s ≥ γ, then after rescaling s ≥ 1; i.e. overall the relative weight of a new sample relative to an old sample is still growing fast. If on the contrary s < γ, and this defines the exit from the initial stage, then the initial stage is over and from now N simply grows at the sampling rate (see Eq. 6.19). To really ensure that s ≥ 1 holds before exiting, so that samples after the exit have at least the sample weight of older samples, the last covering stage is extended by a sufficient number of updates. 6.5.3 Choice of target distribution The target distribution ρ(λ) is traditionally chosen to be uniform ρconst (λ) = const. (6.23) This choice exactly flattens F (λ) in user-defined sampling interval I. Generally, ρ(λ) = 0, λ ∈ / I. In certain cases other choices may be preferable. For instance, in the multidimensional case the rectangular sampling interval is likely to contain regions of very high free energy, e.g. where atoms are clashing. To exclude such regions, ρ(λ) can specified by the following function of the free energy 1 ρcut (λ) ∝ , (6.24) F 1 + e (λ)−Fcut where Fcut is a free energy cutoff (relative to minλ F (λ)). Thus, regions of the sampling interval where F (λ) > Fcut will be exponentially suppressed (in a smooth fashion). Alternatively, very high free energy regions could be avoided while still flattening more moderate free energy barriers by targeting a Boltzmann distribution corresponding to scaling β = 1/kB T by a factor 0 < sβ < 1, ρBoltz (λ) ∝ e−sβ F (λ) , (6.25) The parameter sβ determines to what degree the free energy landscape is flattened; the lower sβ , the flatter. Note that both ρcut (λ) and ρBoltz (λ) depend on F (λ), which needs to be substituted 6.5. Adaptive biasing with AWH 167 by the current best estimate Fn (λ). Thus, the target distribution is also updated (consistently with Eq. 6.15). There is in fact an alternative approach to obtaining ρBoltz (λ) as the limiting target distribution in AWH, which is particular in the way the weight histogram W (λ) and the target distribution ρ are updated and coupled to each other. This yields an evolution of the bias potential which is very similar to that of well-tempered metadynamics [147], see [142] for details. Because of the popularity and success of well-tempered metadynamics, this is a special case worth considering. In this case ρ is a function of the reference weight histogram ρBoltz,loc (λ) ∝ W (λ), (6.26) and the update of the weight histogram is modified (cf. Eq. 6.19) Wn+1 (λ) = Wn (λ) + sβ X ω(λ|x(t)). (6.27) t Thus, here the weight histogram equals the real history of samples, but scaled by sβ . This target distribution is called local Boltzmann since W is only modified locally, where sampling has taken place. We see that when sβ ≈ 0 the histogram essentially does not grow and the size of the free energy update will stay at a constant value (as in the original formulation of metadynamics). Thus, the free energy estimate will not converge, but continue to fluctuate around the correct value. This illustrates the inherent coupling between the convergence and choice of target distribution for this special choice of target. Furthermore note that when using ρ = ρBoltz,loc there is no initial stage (section 6.5.2). The rescaling of the weight histogram applied in the initial stage is a global operation, which is incompatible ρBoltz,loc only depending locally on the sampling history. Lastly, the target distribution can be modulated by arbitrary probability weights ρ(λ) = ρ0 (λ)wuser (λ). (6.28) where wuser (λ) is provided by user data and in principle ρ0 (λ) can be any of the target distributions mentioned above. 6.5.4 Multiple independent or sharing biases Multiple independent bias potentials may be applied within one simulation. This only makes sense if the biased coordinates ξ (1) , ξ (2) , . . . evolve essentially independently from one another. A typical example of this would be when applying an independent bias to each monomer of a protein. Furthermore, multiple AWH simulations can be launched in parallel, each with a (set of) indepedendent biases. If the defined sampling interval is large relative to the diffusion time of the reaction coordinate, traversing the sampling interval multiple times as is required by the initial stage (section 6.5.2) may take an infeasible mount of simulation time. In these cases it could be advantageous to parallelize the work and have a group of multiple “walkers” ξ (i) (t) share a single bias potential. This can be achieved by collecting samples from all ξ (i) of the same sharing group into a single histogram and update a common free energy estimate. Samples can be shared between walkers within the simulation and/or between multiple simulations. However, currently only sharing between simulations is supported in the code while all biases within a simulation are independent. 168 Chapter 6. Special Topics Note that when attempting to shorten the simulation time by using bias-sharing walkers, care must be taken to ensure the simulations are still long enough to properly explore and equilibrate all regions of the sampling interval. To begin, the walkers in a group should be decorrelated and distributed approximately according to the target distribution before starting to refine the free energy. This can be achieved e.g. by “equilibrating” the shared weight histogram before letting it grow; for instance, W (λ)/N ≈ ρ(λ) with some tolerance. Furthermore, the “covering” or transition criterion of the initial stage should to be generalized to detect when the sampling interval has been collectively traversed. One alternative is to just use the same criterion as for a single walker (but now with more samples), see Eq. 6.22. However, in contrast to the single walker case this does not ensure that any real transitions across the sampling interval has taken place; in principle all walkers could be sampling only very locally and still cover the whole interval. Just as with a standard umbrella sampling procedure, the free energy may appear to be converged while in reality simulations sampling closeby λ values are sampling disconnected regions of phase space. A stricter criterion, which helps avoid such issues, is to require that before a simulation marks a point λµ along dimension µ as visited, and shares this with the other walkers, also all points within a certain diameter Dcover should have been visited (i.e.fulfill Eq. 6.22). Increasing Dcover increases robustness, but may slow down convergence. For the maximum value of Dcover , equal to the length of the sampling interval, the sampling interval is considered covered when at least one walker has independently traversed the sampling interval. 6.5.5 Reweighting and combining biased data Often one may want to, post-simulation, calculate the unbiased PMF Φ(u) of another variable u(x). Φ(u) can be estimated using ξ-biased data by reweighting (“unbiasing”) the trajectory using the bias potential Un(t) , see Eq. 6.21. Essentially, one bins the biased data along u and removes the effect of Un(t) by dividing the weight of samples u(t) by e−Un(t) (ξ(t)) , Φ̂(u) = − ln X 1u (u(t))eUn(t) (ξ(t) Zn(t) . (6.29) t Here the indicator function 1u denotes the binning procedure: 1Ru (u0 ) = 1 if u0 falls into the bin labeled by u and 0 otherwise. The normalization factor Zn = e−Φ(ξ)−Un (ξ) dξ is the partition function of the extended ensemble. As can be seen Zn depends on Φ(ξ), the PMF of the (biased) reaction coordinate ξ (which is calculated and written to file by the AWH simulation). It is advisable to use only final stage data in the reweighting procedure due to the rapid change of the bias potential during the initial stage. If one would include initial stage data, one should use the sample weights that are inferred by the repeated rescaling of the histogram in the initial stage, for the sake of consistency. Initial stage samples would then in any case be heavily scaled down relative to final stage samples. Note that Eq. 6.29 can also be used to combine data from multiple simulations (by adding another sum also over the trajectory set). Furthermore, when multiple independent AWH biases have generated a set of PMF estimates {Φ̂(i) (ξ)}, a combined best estimate Φ̂(ξ) can be obtained by applying self-consistent exponential averaging. More details on this procedure and a derivation of Eq. 6.29 (using slightly different notation) can be found in [148]. 6.5. Adaptive biasing with AWH 6.5.6 169 The friction metric During the AWH simulation, the following time-integrated force correlation function is calculated, Z ∞ hδFµ (x(t), λ)δFν (x(0), λ)ω(λ|x(t))ω(λ|x(0))i ηµν (λ) = β dt. (6.30) hω 2 (λ|x)i 0 Here Fµ (x, λ) = kµ (ξµ (x) − λµ ) is the force along dimension µ from an harmonic potential centered at λ and δFµ (x, λ) = Fµ (x, λ) − hFµ (x, λ)i is the deviation of the force. The factors ω(λ|x(t)), see Eq. 6.17, reweight the samples. ηµν (λ) is a friction tensor [149]. Its matrix elements are inversely proportional to local diffusion coefficients. A measure of sampling (in)efficiency at each λ is given by 1 η 2 (λ) = q det ηµν (λ). (6.31) 1 A large value of η 2 (λ) indicates slow dynamics and long correlation times, which may require more sampling. 6.5.7 Usage AWH stores data in the energy file (.edr) with a frequency set by the user. The data – the PMF, the convolved bias, distributions of the λ and ξ coordinates, etc. – can be extracted after the simulation using the gmx awh tool. Furthermore, the trajectory of the reaction coordinate ξ(t) is printed to the pull output file pullx.xvg. The log file (.log) also contains information; check for messages starting with “awh”, they will tell you about covering and potential sampling issues. Setting the initial update size The initial value of the weight histogram size N sets the initial update size (and the rate of change of the bias). When N is kept constant, like in the initial stage, the average variance of the free energy scales as ε2 ∼ 1/(N D) [142], for a simple model system with constant diffusion D along the reaction coordinate. This provides a ballpark estimate used by AWH to initialize N in terms of more meaningful quantities 1 1 = ∼ Dε20 . (6.32) N0 N0 (ε0 , D) Essentially, this tells us that a slower system (small D) requires more samples (larger N 0 ) to attain the same level of accuracy (ε0 ) at a given sampling rate. Conversely, for a system of given diffusion, how to choose the initial biasing rate depends on how good the initial accuracy is. Both the initial error ε0 and the diffusion D only need to be roughly estimated or guessed. In the typical case, one would only tweak the D parameter, and use a default value for ε0 . For good convergence, D should be chosen as large as possible (while maintaining a stable system) giving large initial bias updates and fast initial transitions. Choosing D too small can lead to slow initial convergence. It may be a good idea to run a short trial simulation and after the first covering check the maximum free energy difference of the PMF estimate. If this is much larger than the expected magnitude of the free energy barriers that should be crossed, then the system is probably being pulled too hard and D should be decreased. ε0 on the other hand, would only be tweaked when starting an AWH simulation using a fairly accurate guess of the PMF as input. 170 Chapter 6. Special Topics Tips for efficient sampling The force constant k should be larger than the curvature of the PMF landscape. If this is not the case, the distributions of the reaction coordinate ξ and the reference coordinate λ, will differ significantly and warnings will be printed in the log file. One can choose k as large as the time step supports. This will neccessarily increase the number of points of the discretized sampling interval I. In general however, it should not affect the performance of the simulation noticeably because the AWH update is implemented such that only sampled points are accessed at free energy update time. As with any method, the choice of reaction coordinate(s) is critical. If a single reaction coordinate does not suffice, identifying a second reaction coordinate and sampling the two-dimensional landscape may help. In this case, using a target distribution with a free energy cutoff (see Eq. 6.24) might be required to avoid sampling uninteresting regions of very high free energy. Obtaining accurate free energies for reaction coordinates of much higher dimensionality than 3 or possibly 4 is generally not feasible. Monitoring the transition rate of ξ(t), across the sampling interval is also advisable. For reliable statistics (e.g. when reweighting the trajectory as described in section 6.5.5), one would generally want to observe at least a few transitions after having exited the initial stage. Furthermore, if the dynamics of the reaction coordinate suddenly changes, this may be a sign of e.g. a reaction coordinate problem. Difficult regions of sampling may also be detected by calculating the friction tensor ηµν (λ) in 1 the sampling interval, see section 6.5.6. ηµν (λ) as well as the sampling efficiency measure η 2 (λ) 1 (Eq. 6.31) are written to the energy file and can be extracted with gmx awh. A high peak in η 2 (λ) indicates that this region requires longer time to sample properly. 6.6 Enforced Rotation This module can be used to enforce the rotation of a group of atoms, as e.g. a protein subunit. There are a variety of rotation potentials, among them complex ones that allow flexible adaptations of both the rotated subunit as well as the local rotation axis during the simulation. An example application can be found in ref. [150]. 6.6.1 Fixed Axis Rotation Stationary Axis with an Isotropic Potential In the fixed axis approach (see Fig. 6.5B), torque on a group of N atoms with positions xi (denoted “rotation group”) is applied by rotating a reference set of atomic positions – usually their initial positions y 0i – at a constant angular velocity ω around an axis defined by a direction vector v̂ and a pivot point u. To that aim, each atom with position xi is attracted by a “virtual spring” potential to its moving reference position y i = Ω(t)(y 0i − u), where Ω(t) is a matrix that describes the 6.6. Enforced Rotation 171 Figure 6.5: Comparison of fixed and flexible axis rotation. A: Rotating the sketched shape inside the white tubular cavity can create artifacts when a fixed rotation axis (dashed) is used. More realistically, the shape would revolve like a flexible pipe-cleaner (dotted) inside the bearing (gray). B: Fixed rotation around an axis v with a pivot point specified by the vector u. C: Subdividing the rotating fragment into slabs with separate rotation axes (↑) and pivot points (•) for each slab allows for flexibility. The distance between two slabs with indices n and n + 1 is ∆x. rotation around the axis. In the simplest case, the “springs” are described by a harmonic potential, V iso = N h i2 kX wi Ω(t)(y 0i − u) − (xi − u) , 2 i=1 (6.33) with optional mass-weighted prefactors wi = N mi /M with total mass M = rotation matrix Ω(t) is PN i=1 mi . The cos ωt + vx2 ξ vx vy ξ − vz sin ωt vx vz ξ + vy sin ωt cos ωt + vy2 ξ vy vz ξ − vx sin ωt , Ω(t) = vx vy ξ + vz sin ωt 2 vx vz ξ − vy sin ωt vy vz ξ + vx sin ωt cos ωt + vz ξ where vx , vy , and vz are the components of the normalized rotation vector v̂, and ξ := 1−cos(ωt). As illustrated in Fig. 6.6A for a single atom j, the rotation matrix Ω(t) operates on the initial reference positions y 0j = xj (t0 ) of atom j at t = t0 . At a later time t, the reference position has rotated away from its initial place (along the blue dashed line), resulting in the force h i F jiso = −∇j V iso = k wj Ω(t)(y 0j − u) − (xj − u) , (6.34) which is directed towards the reference position. Pivot-Free Isotropic Potential Instead of a fixed pivot vector u this potential uses the center of mass xc of the rotation group as pivot for the rotation axis, xc = N 1 X mi xi M i=1 and y 0c = N 1 X mi y 0i , M i=1 (6.35) 172 Chapter 6. Special Topics V iso V rm , V flex V rm2 , V flex2 ( = 0 nm2 ) V rm2 , V flex2 ( = 0.01 nm2 ) Figure 6.6: Selection of different rotation potentials and definition of notation. All four potentials V (color coded) are shown for a single atom at position xj (t). A: Isotropic potential V iso , B: radial motion potential V rm and flexible potential V flex , C–D: radial motion 2 potential V rm2 and flexible 2 potential V flex2 for 0 = 0 nm2 (C) and 0 = 0.01 nm2 (D). The rotation axis is perpendicular to the plane and marked by ⊗. The light gray contours indicate Boltzmann factors e−V /(kB T ) in the xj -plane for T = 300 K and k = 200 kJ/(mol·nm2 ). The green arrow shows the direction of the force F j acting on atom j; the blue dashed line indicates the motion of the reference position. 6.6. Enforced Rotation 173 which yields the “pivot-free” isotropic potential V iso-pf = N h i2 kX wi Ω(t)(y 0i − y 0c ) − (xi − xc ) , 2 i=1 with forces h (6.36) i Fjiso-pf = k wj Ω(t)(y 0j − y 0c ) − (xj − xc ) . (6.37) Without mass-weighting, the pivot xc is the geometrical center of the group. Parallel Motion Potential Variant The forces generated by the isotropic potentials (eqns. 6.33 and 6.36) also contain components parallel to the rotation axis and thereby restrain motions along the axis of either the whole rotation group (in case of V iso ) or within the rotation group (in case of V iso-pf ). For cases where unrestrained motion along the axis is preferred, we have implemented a “parallel motion” variant by eliminating all components parallel to the rotation axis for the potential. This is achieved by projecting the distance vectors between reference and actual positions r i = Ω(t)(y 0i − u) − (xi − u) (6.38) onto the plane perpendicular to the rotation vector, r⊥ i := r i − (r i · v̂)v̂ , (6.39) yielding V pm = = N kX 2 wi (r ⊥ i ) 2 i=1 N n kX wi Ω(t)(y 0i − u) − (xi − u) 2 i=1 − nh i o o2 Ω(t)(y 0i − u) − (xi − u) · v̂ v̂ , (6.40) and similarly F jpm = k wj r ⊥ j . (6.41) Pivot-Free Parallel Motion Potential Replacing in eqn. 6.40 the fixed pivot u by the center of mass xc yields the pivot-free variant of the parallel motion potential. With si = Ω(t)(y 0i − y 0c ) − (xi − xc ) (6.42) the respective potential and forces are N kX 2 wi (s⊥ i ) , 2 i=1 V pm-pf = F jpm-pf = k w j s⊥ j . (6.43) (6.44) 174 Chapter 6. Special Topics Radial Motion Potential In the above variants, the minimum of the rotation potential is either a single point at the reference position y i (for the isotropic potentials) or a single line through y i parallel to the rotation axis (for the parallel motion potentials). As a result, radial forces restrict radial motions of the atoms. The two subsequent types of rotation potentials, V rm and V rm2 , drastically reduce or even eliminate this effect. The first variant, V rm (Fig. 6.6B), eliminates all force components parallel to the vector connecting the reference atom and the rotation axis, N kX wi [pi · (xi − u)]2 , 2 i=1 V rm = with pi := (6.45) v̂ × Ω(t)(y 0i − u) . kv̂ × Ω(t)(y 0i − u)k (6.46) This variant depends only on the distance pi · (xi − u) of atom i from the plane spanned by v̂ and Ω(t)(y 0i − u). The resulting force is h i Fjrm = −k wj pj · (xj − u) pj . (6.47) Pivot-Free Radial Motion Potential Proceeding similar to the pivot-free isotropic potential yields a pivot-free version of the above potential. With v̂ × Ω(t)(y 0i − y 0c ) q i := , (6.48) kv̂ × Ω(t)(y 0i − y 0c )k the potential and force for the pivot-free variant of the radial motion potential read V rm-pf Fjrm-pf = N kX wi [q i · (xi − xc )]2 , 2 i=1 (6.49) N mj X = −k wj q j · (xj − xc ) q j + k wi [q i · (xi − xc )] q i . M i=1 i h (6.50) Radial Motion 2 Alternative Potential As seen in Fig. 6.6B, the force resulting from V rm still contains a small, second-order radial component. In most cases, this perturbation is tolerable; if not, the following alternative, V rm2 , fully eliminates the radial contribution to the force, as depicted in Fig. 6.6C, V rm2 N kX (v̂ × (xi − u)) · Ω(t)(y 0i − u) = wi 2 i=1 kv̂ × (xi − u)k2 + 0 2 , (6.51) where a small parameter 0 has been introduced to avoid singularities. For 0 = 0 nm2 , the equipotential planes are spanned by xi − u and v̂, yielding a force perpendicular to xi − u, thus not contracting or expanding structural parts that moved away from or toward the rotation axis. 6.6. Enforced Rotation 175 Choosing a small positive 0 (e.g., 0 = 0.01 nm2 , Fig. 6.6D) in the denominator of eqn. 6.51 yields a well-defined potential and continuous forces also close to the rotation axis, which is not the case for 0 = 0 nm2 (Fig. 6.6C). With r i := Ω(t)(y 0i − u) v̂ × (xi − u) si := ≡ Ψi v̂ × (xi − u) kv̂ × (xi − u)k 1 Ψ∗i := kv̂ × (xi − u)k2 + 0 (6.52) (6.53) (6.54) the force on atom j reads ( F jrm2 = −k wj (sj · rj ) " ∗ Ψ Ψj∗2 rj − 3 (sj · rj )sj Ψj Ψj #) j × v̂. (6.55) Pivot-Free Radial Motion 2 Potential The pivot-free variant of the above potential is V rm2-pf = N kX (v̂ × (xi − xc )) · Ω(t)(y 0i − y c ) wi 2 i=1 kv̂ × (xi − xc )k2 + 0 2 . (6.56) With r i := Ω(t)(y 0i − y c ) v̂ × (xi − xc ) si := ≡ Ψi v̂ × (xi − xc ) kv̂ × (xi − xc )k 1 Ψ∗i := kv̂ × (xi − xc )k2 + 0 (6.57) (6.58) (6.59) the force on atom j reads ( F jrm2-pf = −k wj (sj · rj ) mj +k M 6.6.2 (N X i=1 " ∗ Ψ Ψj∗2 rj − 3 (sj · rj )sj Ψj Ψj j " wi (si · r i ) #) × v̂ Ψ∗i Ψ∗2 r i − i3 (si · r i ) si Ψi Ψi #) × v̂ . (6.60) Flexible Axis Rotation As sketched in Fig. 6.5A–B, the rigid body behavior of the fixed axis rotation scheme is a drawback for many applications. In particular, deformations of the rotation group are suppressed when the equilibrium atom positions directly depend on the reference positions. To avoid this limitation, eqns. 6.50 and 6.56 will now be generalized towards a “flexible axis” as sketched in Fig. 6.5C. This will be achieved by subdividing the rotation group into a set of equidistant slabs perpendicular to the rotation vector, and by applying a separate rotation potential to each of these slabs. Fig. 6.5C shows the midplanes of the slabs as dotted straight lines and the centers as thick black dots. 176 Chapter 6. Special Topics Figure 6.7: Gaussian functions gn centered at n ∆x for a slab distance ∆x = 1.5 nm and n ≥ −2. Gaussian function g0 is highlighted in bold; the dashed line depicts the sum of the shown Gaussian functions. To avoid discontinuities in the potential and in the forces, we define “soft slabs” by weighing the contributions of each slab n to the total potential function V flex by a Gaussian function ! β 2 (xi ) gn (xi ) = Γ exp − n 2 , 2σ (6.61) centered at the midplane of the nth slab. Here σ is the width of the Gaussian function, ∆x the distance between adjacent slabs, and βn (xi ) := xi · v̂ − n ∆x . (6.62) A most convenient choice is σ = 0.7∆x and (n − 14 )2 exp − 1/Γ = 2 · 0.72 n∈Z X ! ≈ 1.75464 , which yields a nearly constant sum, essentially independent of xi (dashed line in Fig. 6.7), i.e., X gn (xi ) = 1 + (xi ) , (6.63) n∈Z with |(xi )| < 1.3 · 10−4 . This choice also implies that the individual contributions to the force from the slabs add up to unity such that no further normalization is required. To each slab center xnc , all atoms contribute by their Gaussian-weighted (optionally also massweighted) position vectors gn (xi ) xi . The instantaneous slab centers xnc are calculated from the current positions xi , PN i=1 gn (xi ) mi xi xnc = P , (6.64) N i=1 gn (xi ) mi while the reference centers y nc are calculated from the reference positions y 0i , y nc PN = 0 0 i=1 gn (y i ) mi y i 0 i=1 gn (y i ) mi PN . (6.65) Due to the rapid decay of gn , each slab will essentially involve contributions from atoms located within ≈ 3∆x from the slab center only. 6.6. Enforced Rotation 177 Flexible Axis Potential We consider two flexible axis variants. For the first variant, the slab segmentation procedure with Gaussian weighting is applied to the radial motion potential (eqn. 6.50 / Fig. 6.6B), yielding as the contribution of slab n V n N kX = wi gn (xi ) [q ni · (xi − xnc )]2 , 2 i=1 and a total potential function V flex = X Vn. (6.66) n Note that the global center of mass xc used in eqn. 6.50 is now replaced by xnc , the center of mass of the slab. With v̂ × Ω(t)(y 0i − y nc ) kv̂ × Ω(t)(y 0i − y nc )k := q ni · (xi − xnc ) , q ni := (6.67) bni (6.68) the resulting force on atom j reads F jflex = − k wj X gn (xj ) bnj q nj − bnj n + k mj X n βn (xj ) v̂ 2σ 2 N gn (xj ) X βn (xj ) n wi gn (xi ) bni q ni − [q i · (xj − xnc )] v̂ . (6.69) 2 σ g (x ) h n h i=1 P Note that for V flex , as defined, the slabs are fixed in space and so are the reference centers y nc . If during the simulation the rotation group moves too far in v direction, it may enter a region where – due to the lack of nearby reference positions – no reference slab centers are defined, rendering the potential evaluation impossible. We therefore have included a slightly modified version of this potential that avoids this problem by attaching the midplane of slab n = 0 to the center of mass of the rotation group, yielding slabs that move with the rotation group. This is achieved by subtracting the center of mass xc of the group from the positions, x̃i = xi − xc , and ỹ 0i = y 0i − y 0c , (6.70) such that V = #2 N k XX v̂ × Ω(t)(ỹ 0i − ỹ nc ) wi gn (x̃i ) · (x̃i − x̃nc ) 2 n i=1 kv̂ × Ω(t)(ỹ 0i − ỹ nc )k " flex-t . (6.71) To simplify the force derivation, and for efficiency reasons, we here assume xc to be constant, and thus ∂xc /∂x = ∂xc /∂y = ∂xc /∂z = 0. The resulting force error is small (of order O(1/N ) or O(mj /M ) if mass-weighting is applied) and can therefore be tolerated. With this assumption, the forces F flex-t have the same form as eqn. 6.69. 178 Chapter 6. Special Topics Flexible Axis 2 Alternative Potential In this second variant, slab segmentation is applied to V rm2 (eqn. 6.56), resulting in a flexible axis potential without radial force contributions (Fig. 6.6C), V flex2 N X kX (v̂ × (xi − xnc )) · Ω(t)(y 0i − y nc ) = wi gn (xi ) 2 i=1 n kv̂ × (xi − xnc )k2 + 0 2 . (6.72) With r ni := Ω(t)(y 0i − y nc ) v̂ × (xi − xnc ) sni := ≡ ψi v̂ × (xi − xnc ) kv̂ × (xi − xnc )k 1 ψi∗ := kv̂ × (xi − xnc )k2 + 0 gn (xj ) mj Wjn := P h gn (xh ) mh S n N X := " wi gn (xi ) (sni · r ni ) i=1 ψi∗ n ψi∗2 n n n r − 3 (si · r i ) si ψi i ψi (6.73) (6.74) (6.75) (6.76) # (6.77) the force on atom j reads ( F jflex2 X = −k wj gn (xj ) (snj · rjn ) " ∗ ψ j ψj n ) ( +k X Wjn S n n k + 2 ( × v̂ − k ( X n rjn ψj∗2 − 3 (snj · rjn ) sjn ψj #) × v̂ ) Wjn βn (xj ) 1 n s · S n v̂ σ 2 ψj j βn (xj ) ψj∗ n n 2 wj gn (xj ) (s · r ) v̂. σ 2 ψj2 j j ) X n (6.78) Applying transformation (6.70) yields a “translation-tolerant” version of the flexible 2 potential, V flex2-t . Again, assuming that ∂xc /∂x, ∂xc /∂y, ∂xc /∂z are small, the resulting equations for V flex2-t and F flex2-t are similar to those of V flex2 and F flex2 . 6.6.3 Usage To apply enforced rotation, the particles i that are to be subjected to one of the rotation potentials are defined via index groups rot-group0, rot-group1, etc., in the .mdp input file. The reference positions y 0i are read from a special .trr file provided to grompp. If no such file is found, xi (t = 0) are used as reference positions and written to .trr such that they can be used for subsequent setups. All parameters of the potentials such as k, 0 , etc. (Table 6.1) are provided as .mdp parameters; rot-type selects the type of the potential. The option rot-massw allows to choose whether or not to use mass-weighted averaging. For the flexible potentials, a cutoff value gnmin (typically gnmin = 0.001) makes shure that only significant contributions to V and F are evaluated, i.e. terms with gn (x) < gnmin are omitted. Table 6.2 summarizes observables that are written to additional output files and which are described below. 6.6. Enforced Rotation 179 Table 6.1: Parameters used by the various rotation potentials. x’s indicate which parameter is actually used for a given potential. k parameter .mdp input variable name unit fixed axis potentials: isotropic V iso — pivot-free V iso-pf parallel motion V pm — pivot-free V pm-pf radial motion V rm — pivot-free V rm-pf radial motion 2 V rm2 — pivot-free V rm2-pf flexible axis potentials: flexible V flex — transl. tol. V flex-t flexible 2 V flex2 — transl. tol. V flex2-t k eqn. (6.33) (6.36) (6.40) (6.44) (6.45) (6.50) (6.51) (6.56) eqn. (6.66) (6.71) (6.72) - v̂ u vec pivot ω 0 ∆x gnmin rate eps nm ◦ /ps nm2 nm - x x x x x x x x x x x x - x x x x x x x x x x - - x x x x - x x x x x x x x x x x x x x kJ mol·nm2 - x x x x x x x x x x x x slab-dist min-gauss Table 6.2: Quantities recorded in output files during enforced rotation simulations. All slab-wise data is written every nstsout steps, other rotation data every nstrout steps. quantity V (t) θref (t) θav (t) θfit (t), θfit (t, n) y 0 (n), x0 (t, n) τ (t) τ (t, n) unit kJ/mol degrees degrees degrees nm kJ/mol kJ/mol equation see 6.1 θref (t) = ωt (6.79) (6.81) (6.64, 6.65) (6.82) (6.82) output file rotation rotation rotation rotangles rotslabs rotation rottorque fixed x x x x - flexible x x x x x 180 Chapter 6. Special Topics Angle of Rotation Groups: Fixed Axis For fixed axis rotation, the average angle θav (t) of the group relative to the reference group is determined via the distance-weighted angular deviation of all rotation group atoms from their reference positions, , θav = N X ri θi i=1 N X ri . (6.79) i=1 Here, ri is the distance of the reference position to the rotation axis, and the difference angles θi are determined from the atomic positions, projected onto a plane perpendicular to the rotation axis through pivot point u (see eqn. 6.39 for the definition of ⊥), cos θi = (y i − u)⊥ · (xi − u)⊥ . k(y i − u)⊥ · (xi − u)⊥ k (6.80) The sign of θav is chosen such that θav > 0 if the actual structure rotates ahead of the reference. Angle of Rotation Groups: Flexible Axis For flexible axis rotation, two outputs are provided, the angle of the entire rotation group, and separate angles for the segments in the slabs. The angle of the entire rotation group is determined by an RMSD fit of xi to the reference positions y 0i at t = 0, yielding θfit as the angle by which the reference has to be rotated around v̂ for the optimal fit, ! RMSD(xi , Ω(θfit )y 0i ) = min . (6.81) To determine the local angle for each slab n, both reference and actual positions are weighted with the Gaussian function of slab n, and θfit (t, n) is calculated as in eqn. 6.81) from the Gaussianweighted positions. For all angles, the .mdp input option rot-fit-method controls whether a normal RMSD fit is performed or whether for the fit each position xi is put at the same distance to the rotation axis as its reference counterpart y 0i . In the latter case, the RMSD measures only angular differences, not radial ones. Angle Determination by Searching the Energy Minimum Alternatively, for rot-fit-method = potential, the angle of the rotation group is determined as the angle for which the rotation potential energy is minimal. Therefore, the used rotation potential is additionally evaluated for a set of angles around the current reference angle. In this case, the rotangles.log output file contains the values of the rotation potential at the chosen set of angles, while rotation.xvg lists the angle with minimal potential energy. Torque The torque τ (t) exerted by the rotation potential is calculated for fixed axis rotation via τ (t) = N X i=1 r i (t) × fi⊥ (t), (6.82) 6.7. Electric fields 181 where r i (t) is the distance vector from the rotation axis to xi (t) and fi⊥ (t) is the force component perpendicular to r i (t) and v̂. For flexible axis rotation, torques τn are calculated for each slab using the local rotation axis of the slab and the Gaussian-weighted positions. 6.7 Electric fields A pulsed and oscillating electric field can be applied according to: # " (t − t0 )2 cos [ω(t − t0 )] E(t) = E0 exp − 2σ 2 (6.83) where E0 is the field strength, the angular frequency ω = 2πc/λ, t0 is the time at of the peak in the field strength and σ is the with of the pulse. Special cases occur when σ = 0 (non-pulsed field) and for ω is 0 (static field). This simulated laser-pulse was applied to simulations of melting ice [151]. A pulsed electric field may look ike Fig. 6.8. In the supporting information of that paper the impact of an applied electric field on a system under periodic boundary conditions is analyzed. It is described that the effective electric field under PBC is larger than the applied field, by a factor depending on the size of the box and the dielectric properties of molecules in the box. For a system with static dielectric properties this factor can be corrected for. But for a system where the dielectric varies over time, for example a membrane protein with a pore that opens and closes during the simulatippn, this way of applying an electric field is not useful. In such cases one can use the computational electrophysiology protocol described in the next section (sec. 6.8). 2 Electric field (V/nm) 1 0 -1 -2 0 0.5 1 1.5 2 Time (ps) Figure 6.8: A simulated laser pulse in GROMACS. Electric fields are applied when the following options are specified in the grompp.mdp file. You specify, in order, E0 , ω, t0 and σ: electric-field-x = 0.04 0 0 0 yields a static field with E0 = 0.04 V/nm in the X-direction. In contrast, 182 Chapter 6. Special Topics electric-field-x = 2.0 150 5 0 yields an oscillating electric field with E0 = 2 V/nm, ω = 150/ps and t0 = 5 ps. Finally electric-field-x = 2.0 150 5 1 yields an pulsed-oscillating electric field with E0 = 2 V/nm, ω = 150/ps and t0 = 5 ps and σ = 1 ps. Read more in ref. [151]. Note that the input file format is changed from the undocumented older version. A figure like Fig. 6.8 may be produced by passing the -field option to gmx mdrun. 6.8 Computational Electrophysiology The Computational Electrophysiology (CompEL) protocol [152] allows the simulation of ion flux through membrane channels, driven by transmembrane potentials or ion concentration gradients. Just as in real cells, CompEL establishes transmembrane potentials by sustaining a small imbalance of charges ∆q across the membrane, which gives rise to a potential difference ∆U according to the membrane capacitance: ∆U = ∆q/Cmembrane (6.84) The transmembrane electric field and concentration gradients are controlled by .mdp options, which allow the user to set reference counts for the ions on either side of the membrane. If a difference between the actual and the reference numbers persists over a certain time span, specified by the user, a number of ion/water pairs are exchanged between the compartments until the reference numbers are restored. Alongside the calculation of channel conductance and ion selectivity, CompEL simulations also enable determination of the channel reversal potential, an important characteristic obtained in electrophysiology experiments. In a CompEL setup, the simulation system is divided into two compartments A and B with independent ion concentrations. This is best achieved by using double bilayer systems with a copy (or copies) of the channel/pore of interest in each bilayer (Fig. 6.9 A, B). If the channel axes point in the same direction, channel flux is observed simultaneously at positive and negative potentials in this way, which is for instance important for studying channel rectification. The potential difference ∆U across the membrane is easily calculated with the gmx potential utility. By this, the potential drop along z or the pore axis is exactly known in each time interval of the simulation (Fig. 6.9 C). Type and number of ions ni of charge qi , traversing the channel in the simulation, are written to the swapions.xvg output file, from which the average channel conductance G in each interval ∆t is determined by: i n i qi P G= ∆t ∆U . (6.85) The ion selectivity is calculated as the number flux ratio of different species. Best results are obtained by averaging these values over several overlapping time intervals. The calculation of reversal potentials is best achieved using a small set of simulations in which a given transmembrane concentration gradient is complemented with small ion imbalances of varying magnitude. For example, if one compartment contains 1 M salt and the other 0.1 M, and 6.8. Computational Electrophysiology 183 A B channel 0 0 0.4 0.8 U [V] B +1.0 offset A C U 0 2 nm A channel 1 -1.0 z B q ref 0e 4e 8e 12 e Figure 6.9: Typical double-membrane setup for CompEL simulations (A, B). Ion / water molecule exchanges will be performed as needed between the two light blue volumes around the dotted black lines (A). Plot (C) shows the potential difference ∆U resulting from the selected charge imbalance ∆qref between the compartments. given charge neutrality otherwise, a set of simulations with ∆q = 0 e, ∆q = 2 e, ∆q = 4 e could be used. Fitting a straight line through the current-voltage relationship of all obtained I-U pairs near zero current will then yield Urev . 6.8.1 Usage The following .mdp options control the CompEL protocol: swapcoords = Z swap-frequency = 100 ; Swap positions: no, X, Y, Z ; Swap attempt frequency Choose Z if your membrane is in the xy-plane (Fig. 6.9). Ions will be exchanged between compartments depending on their z-positions alone. swap-frequency determines how often a swap attempt will be made. This step requires that the positions of the split groups, the ions, and possibly the solvent molecules are communicated between the parallel processes, so if chosen too small it can decrease the simulation performance. The Position swapping entry in the cycle and time accounting table at the end of the md.log file summarizes the amount of runtime spent in the swap module. split-group0 split-group1 massw-split0 massw-split1 = = = = channel0 ; Defines compartment boundary channel1 ; Defines other compartment boundary no ; use mass-weighted center? no 184 Chapter 6. Special Topics split-group0 and split-group1 are two index groups that define the boundaries between the two compartments, which are usually the centers of the channels. If massw-split0 or massw-split1 are set to yes, the center of mass of each index group is used as boundary, here in z-direction. Otherwise, the geometrical centers will be used (× in Fig. 6.9 A). If, such as here, a membrane channel is selected as split group, the center of the channel will define the dividing plane between the compartments (dashed horizontal lines). All index groups must be defined in the index file. If, to restore the requested ion counts, an ion from one compartment has to be exchanged with a water molecule from the other compartment, then those molecules are swapped which have the largest distance to the compartment-defining boundaries (dashed horizontal lines). Depending on the ion concentration, this effectively results in exchanges of molecules between the light blue volumes. If a channel is very asymmetric in z-direction and would extend into one of the swap volumes, one can offset the swap exchange plane with the bulk-offset parameter. A value of 0.0 means no offset b, values −1.0 < b < 0 move the swap exchange plane closer to the lower, values 0 < b < 1.0 closer to the upper membrane. Fig. 6.9 A (left) depicts that for the A compartment. solvent-group iontypes iontype0-name iontype0-in-A iontype0-in-B iontype1-name iontype1-in-A iontype1-in-B iontype2-name iontype2-in-A iontype2-in-B = = = = = = = = = = = SOL 3 NA 51 35 K 10 38 CL -1 -1 ; ; ; ; ; Group containing the solvent molecules Number of different ion types to control Group name of the ion type Reference count of ions of type 0 in A Reference count of ions of type 0 in B The group name of solvent molecules acting as exchange partners for the ions has to be set with solvent-group. The number of different ionic species under control of the CompEL protocol is given by the iontypes parameter, while iontype0-name gives the name of the index group containing the atoms of this ionic species. The reference number of ions of this type can be set with the iontype0-in-A and iontype0-in-B options for compartments A and B, respectively. Obviously, the sum of iontype0-in-A and iontype0-in-B needs to equal the number of ions in the group defined by iontype0-name. A reference number of -1 means: use the number of ions as found at the beginning of the simulation as the reference value. coupl-steps threshold = 10 = 1 ; Average over these many swap steps ; Do not swap if < threshold If coupl-steps is set to 1, then the momentary ion distribution determines whether ions are exchanged. coupl-steps > 1 will use the time-average of ion distributions over the selected number of attempt steps instead. This can be useful, for example, when ions diffuse near compartment boundaries, which would lead to numerous unproductive ion exchanges. A threshold of 1 means that a swap is performed if the average ion count in a compartment differs by at least 1 6.9. Calculating a PMF using the free-energy code 185 from the requested values. Higher thresholds will lead to toleration of larger differences. Ions are exchanged until the requested number ± the threshold is reached. cyl0-r cyl0-up cyl0-down cyl1-r cyl1-up cyl1-down = = = = = = 5.0 0.75 0.75 5.0 0.75 0.75 ; ; ; ; Split cylinder Split cylinder Split cylinder same for other 0 radius (nm) 0 upper extension (nm) 0 lower extension (nm) channel The cylinder options are used to define virtual geometric cylinders around the channel’s pore to track how many ions of which type have passed each channel. Ions will be counted as having traveled through a channel according to the definition of the channel’s cylinder radius, upper and lower extension, relative to the location of the respective split group. This will not affect the actual flux or exchange, but will provide you with the ion permeation numbers across each of the channels. Note that an ion can only be counted as passing through a particular channel if it is detected within the defined split cylinder in a swap step. If swap-frequency is chosen too high, a particular ion may be detected in compartment A in one swap step, and in compartment B in the following swap step, so it will be unclear through which of the channels it has passed. A double-layered system for CompEL simulations can be easily prepared by duplicating an existing membrane/channel MD system in the direction of the membrane normal (typically z) with gmx editconf -translate 0 0 , where l_z is the box length in that direction. If you have already defined index groups for the channel for the single-layered system, gmx make_ndx -n index.ndx -twin will provide you with the groups for the double-layered system. To suppress large fluctuations of the membranes along the swap direction, it may be useful to apply a harmonic potential (acting only in the swap dimension) between each of the two channel and/or bilayer centers using umbrella pulling (see section 6.4). Multimeric channels If a split group consists of more than one molecule, the correct PBC image of all molecules with respect to each other has to be chosen such that the channel center can be correctly determined. GROMACS assumes that the starting structure in the .tpr file has the correct PBC representation. Set the following environment variable to check whether that is the case: • GMX_COMPELDUMP: output the starting structure after it has been made whole to .pdb file. 6.9 Calculating a PMF using the free-energy code The free-energy coupling-parameter approach (see sec. 3.12) provides several ways to calculate potentials of mean force. A potential of mean force between two atoms can be calculated by connecting them with a harmonic potential or a constraint. For this purpose there are special potentials that avoid the generation of extra exclusions, see sec. 5.4.4. When the position of the 186 Chapter 6. Special Topics minimum or the constraint length is 1 nm more in state B than in state A, the restraint or constraint force is given by ∂H/∂λ. The distance between the atoms can be changed as a function of λ and time by setting delta-lambda in the .mdp file. The results should be identical (although not numerically due to the different implementations) to the results of the pull code with umbrella sampling and constraint pulling. Unlike the pull code, the free energy code can also handle atoms that are connected by constraints. Potentials of mean force can also be calculated using position restraints. With position restraints, atoms can be linked to a position in space with a harmonic potential (see 4.3.1). These positions can be made a function of the coupling parameter λ. The positions for the A and the B states are supplied to grompp with the -r and -rb options, respectively. One could use this approach to do targeted MD; note that we do not encourage the use of targeted MD for proteins. A protein can be forced from one conformation to another by using these conformations as position restraint coordinates for state A and B. One can then slowly change λ from 0 to 1. The main drawback of this approach is that the conformational freedom of the protein is severely limited by the position restraints, independent of the change from state A to B. Also, the protein is forced from state A to B in an almost straight line, whereas the real pathway might be very different. An example of a more fruitful application is a solid system or a liquid confined between walls where one wants to measure the force required to change the separation between the boundaries or walls. Because the boundaries (or walls) already need to be fixed, the position restraints do not limit the system in its sampling. 6.10 Removing fastest degrees of freedom The maximum time step in MD simulations is limited by the smallest oscillation period that can be found in the simulated system. Bond-stretching vibrations are in their quantum-mechanical ground state and are therefore better represented by a constraint instead of a harmonic potential. For the remaining degrees of freedom, the shortest oscillation period (as measured from a simulation) is 13 fs for bond-angle vibrations involving hydrogen atoms. Taking as a guideline that with a Verlet (leap-frog) integration scheme a minimum of 5 numerical integration steps should be performed per period of a harmonic oscillation in order to integrate it with reasonable accuracy, the maximum time step will be about 3 fs. Disregarding these very fast oscillations of period 13 fs, the next shortest periods are around 20 fs, which will allow a maximum time step of about 4 fs. Removing the bond-angle degrees of freedom from hydrogen atoms can best be done by defining them as virtual interaction sites instead of normal atoms. Whereas a normal atom is connected to the molecule with bonds, angles and dihedrals, a virtual site’s position is calculated from the position of three nearby heavy atoms in a predefined manner (see also sec. 4.7). For the hydrogens in water and in hydroxyl, sulfhydryl, or amine groups, no degrees of freedom can be removed, because rotational freedom should be preserved. The only other option available to slow down these motions is to increase the mass of the hydrogen atoms at the expense of the mass of the connected heavy atom. This will increase the moment of inertia of the water molecules and the hydroxyl, sulfhydryl, or amine groups, without affecting the equilibrium properties of the system and without affecting the dynamical properties too much. These constructions will shortly be described in sec. 6.10.1 and have previously been described in full detail [153]. 6.10. Removing fastest degrees of freedom 187 111 000 000 111 000 111 α 1111 0000 0000 1111 A 111 000 000 111 111 000 1111 0000 0000 1111 d 1111 0000 0000 1111 B 111 000 000 111 000 111 1111 0000 0000 1111 0000 1111 d 1111 0000 0000 1111 C D Figure 6.10: The different types of virtual site constructions used for hydrogen atoms. The atoms used in the construction of the virtual site(s) are depicted as black circles, virtual sites as gray ones. Hydrogens are smaller than heavy atoms. A: fixed bond angle, note that here the hydrogen is not a virtual site; B: in the plane of three atoms, with fixed distance; C: in the plane of three atoms, with fixed angle and distance; D: construction for amine groups (-NH2 or -NH+ 3 ), see text for details. Using both virtual sites and modified masses, the next bottleneck is likely to be formed by the improper dihedrals (which are used to preserve planarity or chirality of molecular groups) and the peptide dihedrals. The peptide dihedral cannot be changed without affecting the physical behavior of the protein. The improper dihedrals that preserve planarity mostly deal with aromatic residues. Bonds, angles, and dihedrals in these residues can also be replaced with somewhat elaborate virtual site constructions. All modifications described in this section can be performed using the GROMACS topology building tool pdb2gmx. Separate options exist to increase hydrogen masses, virtualize all hydrogen atoms, or also virtualize all aromatic residues. Note that when all hydrogen atoms are virtualized, those inside the aromatic residues will be virtualized as well, i.e. hydrogens in the aromatic residues are treated differently depending on the treatment of the aromatic residues. Parameters for the virtual site constructions for the hydrogen atoms are inferred from the forcefield parameters (vis. bond lengths and angles) directly by grompp while processing the topology file. The constructions for the aromatic residues are based on the bond lengths and angles for the geometry as described in the force fields, but these parameters are hard-coded into pdb2gmx due to the complex nature of the construction needed for a whole aromatic group. 6.10.1 Hydrogen bond-angle vibrations Construction of virtual sites The goal of defining hydrogen atoms as virtual sites is to remove all high-frequency degrees of freedom from them. In some cases, not all degrees of freedom of a hydrogen atom should be removed, e.g. in the case of hydroxyl or amine groups the rotational freedom of the hydrogen atom(s) should be preserved. Care should be taken that no unwanted correlations are introduced by the construction of virtual sites, e.g. bond-angle vibration between the constructing atoms could translate into hydrogen bond-length vibration. Additionally, since virtual sites are by definition massless, in order to preserve total system mass, the mass of each hydrogen atom that is treated as virtual site should be added to the bonded heavy atom. 188 Chapter 6. Special Topics ε δ ε 111 000 000 111 γ ζ 1111 0000 0000 1111 δ Phe δ γ ζ 1111 0000 0000 1111 111 000 000 111 000 ε111 δ Tyr δ ε 1111 0000 0000 1111 1111 0000 0000 1111 0000 1111 ε γ η 111 000 000 111 11 00 00 11 1111 0000 0000 1111 0000 1111 Trp ε 111 000 000 111 ζ δ 111 000 000 111 000 111 δ ε ε111 000 111 000 000 111 η 111 000 000ε 111 000 γ111 111 000 000 111 ζ His δ Figure 6.11: The different types of virtual site constructions used for aromatic residues. The atoms used in the construction of the virtual site(s) are depicted as black circles, virtual sites as gray ones. Hydrogens are smaller than heavy atoms. A: phenylalanine; B: tyrosine (note that the hydroxyl hydrogen is not a virtual site); C: tryptophan; D: histidine. Taking into account these considerations, the hydrogen atoms in a protein naturally fall into several categories, each requiring a different approach (see also Fig. 6.10). • hydroxyl (-OH) or sulfhydryl (-SH) hydrogen: The only internal degree of freedom in a hydroxyl group that can be constrained is the bending of the C-O-H angle. This angle is fixed by defining an additional bond of appropriate length, see Fig. 6.10A. Doing so removes the high-frequency angle bending, but leaves the dihedral rotational freedom. The same goes for a sulfhydryl group. Note that in these cases the hydrogen is not treated as a virtual site. • single amine or amide (-NH-) and aromatic hydrogens (-CH-): The position of these hydrogens cannot be constructed from a linear combination of bond vectors, because of the flexibility of the angle between the heavy atoms. Instead, the hydrogen atom is positioned at a fixed distance from the bonded heavy atom on a line going through the bonded heavy atom and a point on the line through both second bonded atoms, see Fig. 6.10B. • planar amine (-NH2 ) hydrogens: The method used for the single amide hydrogen is not well suited for planar amine groups, because no suitable two heavy atoms can be found to define the direction of the hydrogen atoms. Instead, the hydrogen is constructed at a fixed distance from the nitrogen atom, with a fixed angle to the carbon atom, in the plane defined by one of the other heavy atoms, see Fig. 6.10C. • amine group (umbrella -NH2 or -NH+ 3 ) hydrogens: Amine hydrogens with rotational freedom cannot be constructed as virtual sites from the heavy atoms they are connected to, since this would result in loss of the rotational freedom of the amine group. To preserve the rotational freedom while removing the hydrogen bond-angle degrees of freedom, two “dummy masses” are constructed with the same total mass, moment of inertia (for rotation around the C-N bond) and center of mass as the amine group. These dummy masses have no interaction with any other atom, except for the fact that they are connected to the carbon and to each other, resulting in a rigid triangle. From these three particles, the positions of the nitrogen and hydrogen atoms are constructed as linear combinations of the two carbon-mass vectors and their outer product, resulting in an amine group with rotational freedom intact, but without other internal degrees of freedom. See Fig. 6.10D. 6.11. Viscosity calculation 6.10.2 189 Out-of-plane vibrations in aromatic groups The planar arrangements in the side chains of the aromatic residues lends itself perfectly to a virtual-site construction, giving a perfectly planar group without the inherently unstable constraints that are necessary to keep normal atoms in a plane. The basic approach is to define three atoms or dummy masses with constraints between them to fix the geometry and create the rest of the atoms as simple virtual sites type (see sec. 4.7) from these three. Each of the aromatic residues require a different approach: • Phenylalanine: Cγ , C1 , and C2 are kept as normal atoms, but with each a mass of one third the total mass of the phenyl group. See Fig. 6.10A. • Tyrosine: The ring is treated identically to the phenylalanine ring. Additionally, constraints are defined between C1 , C2 , and Oη . The original improper dihedral angles will keep both triangles (one for the ring and one with Oη ) in a plane, but due to the larger moments of inertia this construction will be much more stable. The bond-angle in the hydroxyl group will be constrained by a constraint between Cγ and Hη . Note that the hydrogen is not treated as a virtual site. See Fig. 6.10B. • Tryptophan: Cβ is kept as a normal atom and two dummy masses are created at the center of mass of each of the rings, each with a mass equal to the total mass of the respective ring (Cδ2 and C2 are each counted half for each ring). This keeps the overall center of mass and the moment of inertia almost (but not quite) equal to what it was. See Fig. 6.10C. • Histidine: Cγ , C1 and N2 are kept as normal atoms, but with masses redistributed such that the center of mass of the ring is preserved. See Fig. 6.10D. 6.11 Viscosity calculation The shear viscosity is a property of liquids that can be determined easily by experiment. It is useful for parameterizing a force field because it is a kinetic property, while most other properties which are used for parameterization are thermodynamic. The viscosity is also an important property, since it influences the rates of conformational changes of molecules solvated in the liquid. The viscosity can be calculated from an equilibrium simulation using an Einstein relation: 1 V d η= lim 2 kB T t→∞ dt *Z t0 +t 0 Pxz (t )dt t0 0 2 + (6.86) t0 This can be done with gmx energy. This method converges very slowly [154], and as such a nanosecond simulation might not be long enough for an accurate determination of the viscosity. The result is very dependent on the treatment of the electrostatics. Using a (short) cut-off results in large noise on the off-diagonal pressure elements, which can increase the calculated viscosity by an order of magnitude. GROMACS also has a non-equilibrium method for determining the viscosity [154]. This makes use of the fact that energy, which is fed into system by external forces, is dissipated through viscous 190 Chapter 6. Special Topics friction. The generated heat is removed by coupling to a heat bath. For a Newtonian liquid adding a small force will result in a velocity gradient according to the following equation: ax (z) + η ∂ 2 vx (z) =0 ρ ∂z 2 (6.87) Here we have applied an acceleration ax (z) in the x-direction, which is a function of the zcoordinate. In GROMACS the acceleration profile is: ax (z) = A cos 2πz lz (6.88) where lz is the height of the box. The generated velocity profile is: 2πz vx (z) = V cos lz lz 2π 2 A lz η= ρ V 2π 2 ρ V =A η (6.89) (6.90) The viscosity can be calculated from A and V : (6.91) In the simulation V is defined as: N X V = 2πz mi vi,x 2 cos lz i=1 N X (6.92) mi i=1 The generated velocity profile is not coupled to the heat bath. Moreover, the velocity profile is excluded from the kinetic energy. One would like V to be as large as possible to get good statistics. However, the shear rate should not be so high that the system gets too far from equilibrium. The maximum shear rate occurs where the cosine is zero, the rate being: shmax = max z ∂vx (z) ρ lz =A ∂z η 2π (6.93) For a simulation with: η = 10−3 [kg m−1 s−1 ], ρ = 103 [kg m−3 ] and lz = 2π [nm], shmax = 1 [ps nm−1 ] A. This shear rate should be smaller than one over the longest correlation time in the system. For most liquids, this will be the rotation correlation time, which is around 10 ps. In this case, A should be smaller than 0.1 [nm ps−2 ]. When the shear rate is too high, the observed viscosity will be too low. Because V is proportional to the square of the box height, the optimal box is elongated in the z-direction. In general, a simulation length of 100 ps is enough to obtain an accurate value for the viscosity. The heat generated by the viscous friction is removed by coupling to a heat bath. Because this coupling is not instantaneous the real temperature of the liquid will be slightly lower than the 6.12. Tabulated interaction functions 191 observed temperature. Berendsen derived this temperature shift [31], which can be written in terms of the shear rate as: ητ sh2 (6.94) Ts = 2ρ Cv max where τ is the coupling time for the Berendsen thermostat and Cv is the heat capacity. Using the values of the example above, τ = 10−13 [s] and Cv = 2 · 103 [J kg−1 K−1 ], we get: Ts = 25 [K ps−2 ] sh2max . When we want the shear rate to be smaller than 1/10 [ps−1 ], Ts is smaller than 0.25 [K], which is negligible. Note that the system has to build up the velocity profile when starting from an equilibrium state. This build-up time is of the order of the correlation time of the liquid. Two quantities are written to the energy file, along with their averages and fluctuations: V and 1/η, as obtained from (6.91). 6.12 Tabulated interaction functions 6.12.1 Cubic splines for potentials In some of the inner loops of GROMACS, look-up tables are used for computation of potential and forces. The tables are interpolated using a cubic spline algorithm. There are separate tables for electrostatic, dispersion, and repulsion interactions, but for the sake of caching performance these have been combined into a single array. The cubic spline interpolation for xi ≤ x < xi+1 looks like this: Vs (x) = A0 + A1 + A2 2 + A3 3 (6.95) where the table spacing h and fraction are given by: h = xi+1 − xi (6.96) = (x − xi )/h (6.97) so that 0 ≤ < 1. From this, we can calculate the derivative in order to determine the forces: dVs (x) d = −(A1 + 2A2 + 3A3 2 )/h (6.98) d dx The four coefficients are determined from the four conditions that Vs and −Vs0 at both ends of each interval should match the exact potential V and force −V 0 . This results in the following errors for each interval: −Vs0 (x) = − h4 + O(h5 ) (6.99) 384 h3 |Vs0 − V 0 |max = V 0000 √ + O(h4 ) (6.100) 72 3 h2 |Vs00 − V 00 |max = V 0000 + O(h3 ) (6.101) 12 V and V’ are continuous, while V” is the first discontinuous derivative. The number of points per nanometer is 500 and 2000 for mixed- and double-precision versions of GROMACS, respectively. This means that the errors in the potential and force will usually be smaller than the mixed precision accuracy. |Vs − V |max = V 0000 192 Chapter 6. Special Topics GROMACS stores A0 , A1 , A2 and A3 . The force routines get a table with these four parameters and a scaling factor s that is equal to the number of points per nm. (Note that h is s−1 ). The algorithm goes a little something like this: 1. Calculate distance vector (r ij ) and distance rij 2. Multiply rij by s and truncate to an integer value n0 to get a table index 3. Calculate fractional component ( = srij − n0 ) and 2 4. Do the interpolation to calculate the potential V and the scalar force f 5. Calculate the vector force F by multiplying f with r ij Note that table look-up is significantly slower than computation of the most simple Lennard-Jones and Coulomb interaction. However, it is much faster than the shifted Coulomb function used in conjunction with the PPPM method. Finally, it is much easier to modify a table for the potential (and get a graphical representation of it) than to modify the inner loops of the MD program. 6.12.2 User-specified potential functions You can also use your own potential functions without editing the GROMACS code. The potential function should be according to the following equation V (rij ) = qi qj f (rij ) + C6 g(rij ) + C12 h(rij ) 4π0 (6.102) where f , g, and h are user defined functions. Note that if g(r) represents a normal dispersion interaction, g(r) should be < 0. C6 , C12 and the charges are read from the topology. Also note that combination rules are only supported for Lennard-Jones and Buckingham, and that your tables should match the parameters in the binary topology. When you add the following lines in your .mdp file: rlist coulombtype rcoulomb vdwtype rvdw = = = = = 1.0 User 1.0 User 1.0 mdrun will read a single non-bonded table file, or multiple when energygrp-table is set (see below). The name of the file(s) can be set with the mdrun option -table. The table file should contain seven columns of table look-up data in the order: x, f (x), −f 0 (x), g(x), −g 0 (x), h(x), −h0 (x). The x should run from 0 to rc + 1 (the value of table_extension can be changed in the .mdp file). You can choose the spacing you like; for the standard tables GROMACS uses a spacing of 0.002 and 0.0005 nm when you run in mixed and double precision, respectively. In this context, rc denotes the maximum of the two cut-offs rvdw and rcoulomb (see above). These variables need not be the same (and need not be 1.0 either). Some functions used for potentials contain a singularity at x = 0, but since atoms are normally not closer to each other than 0.1 nm, 6.13. Mixed Quantum-Classical simulation techniques 193 the function value at x = 0 is not important. Finally, it is also possible to combine a standard Coulomb with a modified LJ potential (or vice versa). One then specifies e.g. coulombtype = Cut-off or coulombtype = PME, combined with vdwtype = User. The table file must always contain the 7 columns however, and meaningful data (i.e. not zeroes) must be entered in all columns. A number of pre-built table files can be found in the GMXLIB directory for 6-8, 6-9, 6-10, 6-11, and 6-12 Lennard-Jones potentials combined with a normal Coulomb. If you want to have different functional forms between different groups of atoms, this can be set through energy groups. Different tables can be used for non-bonded interactions between different energy groups pairs through the .mdp option energygrp-table (see details in the User Guide). Atoms that should interact with a different potential should be put into different energy groups. Between group pairs which are not listed in energygrp-table, the normal user tables will be used. This makes it easy to use a different functional form between a few types of atoms. 6.13 Mixed Quantum-Classical simulation techniques In a molecular mechanics (MM) force field, the influence of electrons is expressed by empirical parameters that are assigned on the basis of experimental data, or on the basis of results from high-level quantum chemistry calculations. These are valid for the ground state of a given covalent structure, and the MM approximation is usually sufficiently accurate for ground-state processes in which the overall connectivity between the atoms in the system remains unchanged. However, for processes in which the connectivity does change, such as chemical reactions, or processes that involve multiple electronic states, such as photochemical conversions, electrons can no longer be ignored, and a quantum mechanical description is required for at least those parts of the system in which the reaction takes place. One approach to the simulation of chemical reactions in solution, or in enzymes, is to use a combination of quantum mechanics (QM) and molecular mechanics (MM). The reacting parts of the system are treated quantum mechanically, with the remainder being modeled using the force field. The current version of GROMACS provides interfaces to several popular Quantum Chemistry packages (MOPAC [155], GAMESS-UK [156], Gaussian [157] and CPMD [158]). GROMACS interactions between the two subsystems are either handled as described by Field et al. [159] or within the ONIOM approach by Morokuma and coworkers [160, 161]. 6.13.1 Overview Two approaches for describing the interactions between the QM and MM subsystems are supported in this version: 1. Electronic Embedding The electrostatic interactions between the electrons of the QM region and the MM atoms and between the QM nuclei and the MM atoms are included in the Hamiltonian for the QM subsystem: H QM/M M = HeQM − n X M X e2 QJ i J 4π0 riJ + N X M X e2 ZA QJ A J eπ0 RAJ , (6.103) 194 Chapter 6. Special Topics where n and N are the number of electrons and nuclei in the QM region, respectively, and M is the number of charged MM atoms. The first term on the right hand side is the original electronic Hamiltonian of an isolated QM system. The first of the double sums is the total electrostatic interaction between the QM electrons and the MM atoms. The total electrostatic interaction of the QM nuclei with the MM atoms is given by the second double sum. Bonded interactions between QM and MM atoms are described at the MM level by the appropriate force-field terms. Chemical bonds that connect the two subsystems are capped by a hydrogen atom to complete the valence of the QM region. The force on this atom, which is present in the QM region only, is distributed over the two atoms of the bond. The cap atom is usually referred to as a link atom. 2. ONIOM In the ONIOM approach, the energy and gradients are first evaluated for the isolated QM subsystem at the desired level of ab initio theory. Subsequently, the energy and gradients of the total system, including the QM region, are computed using the molecular mechanics force field and added to the energy and gradients calculated for the isolated QM subsystem. Finally, in order to correct for counting the interactions inside the QM region twice, a molecular mechanics calculation is performed on the isolated QM subsystem and the energy and gradients are subtracted. This leads to the following expression for the total QM/MM energy (and gradients likewise): MM − EIM M , Etot = EIQM + EI+II (6.104) where the subscripts I and II refer to the QM and MM subsystems, respectively. The superscripts indicate at what level of theory the energies are computed. The ONIOM scheme has the advantage that it is not restricted to a two-layer QM/MM description, but can easily handle more than two layers, with each layer described at a different level of theory. 6.13.2 Usage To make use of the QM/MM functionality in GROMACS, one needs to: 1. introduce link atoms at the QM/MM boundary, if needed; 2. specify which atoms are to be treated at a QM level; 3. specify the QM level, basis set, type of QM/MM interface and so on. Adding link atoms At the bond that connects the QM and MM subsystems, a link atoms is introduced. In GROMACS the link atom has special atomtype, called LA. This atomtype is treated as a hydrogen atom in the QM calculation, and as a virtual site in the force-field calculation. The link atoms, if any, are part of the system, but have no interaction with any other atom, except that the QM force working on it is distributed over the two atoms of the bond. In the topology, the link atom (LA), therefore, is defined as a virtual site atom: 6.13. Mixed Quantum-Classical simulation techniques 195 [ virtual_sites2 ] LA QMatom MMatom 1 0.65 See sec. 5.2.2 for more details on how virtual sites are treated. The link atom is replaced at every step of the simulation. In addition, the bond itself is replaced by a constraint: [ constraints ] QMatom MMatom 2 0.153 Note that, because in our system the QM/MM bond is a carbon-carbon bond (0.153 nm), we use a constraint length of 0.153 nm, and dummy position of 0.65. The latter is the ratio between the ideal C-H bond length and the ideal C-C bond length. With this ratio, the link atom is always 0.1 nm away from the QMatom, consistent with the carbon-hydrogen bond length. If the QM and MM subsystems are connected by a different kind of bond, a different constraint and a different dummy position, appropriate for that bond type, are required. Specifying the QM atoms Atoms that should be treated at a QM level of theory, including the link atoms, are added to the index file. In addition, the chemical bonds between the atoms in the QM region are to be defined as connect bonds (bond type 5) in the topology file: [ bonds ] QMatom1 QMatom2 5 QMatom2 QMatom3 5 Specifying the QM/MM simulation parameters In the .mdp file, the following parameters control a QM/MM simulation. QMMM = no If this is set to yes, a QM/MM simulation is requested. Several groups of atoms can be described at different QM levels separately. These are specified in the QMMM-grps field separated by spaces. The level of ab initio theory at which the groups are described is specified by QMmethod and QMbasis Fields. Describing the groups at different levels of theory is only possible with the ONIOM QM/MM scheme, specified by QMMMscheme. QMMM-grps = groups to be described at the QM level QMMMscheme = normal Options are normal and ONIOM. This selects the QM/MM interface. normal implies that the QM subsystem is electronically embedded in the MM subsystem. There can only be one QMMM-grps that is modeled at the QMmethod and QMbasis level of ab initio 196 Chapter 6. Special Topics theory. The rest of the system is described at the MM level. The QM and MM subsystems interact as follows: MM point charges are included in the QM one-electron Hamiltonian and all Lennard-Jones interactions are described at the MM level. If ONIOM is selected, the interaction between the subsystem is described using the ONIOM method by Morokuma and co-workers. There can be more than one QMMM-grps each modeled at a different level of QM theory (QMmethod and QMbasis). QMmethod = Method used to compute the energy and gradients on the QM atoms. Available methods are AM1, PM3, RHF, UHF, DFT, B3LYP, MP2, CASSCF, MMVB and CPMD. For CASSCF, the number of electrons and orbitals included in the active space is specified by CASelectrons and CASorbitals. For CPMD, the plane-wave cut-off is specified by the planewavecutoff keyword. QMbasis = Gaussian basis set used to expand the electronic wave-function. Only Gaussian basis sets are currently available, i.e. STO-3G, 3-21G, 3-21G*, 3-21+G*, 6-21G, 6-31G, 6-31G*, 6-31+G*, and 6-311G. For CPMD, which uses plane wave expansion rather than atomcentered basis functions, the planewavecutoff keyword controls the plane wave expansion. QMcharge = The total charge in e of the QMMM-grps. In case there are more than one QMMM-grps, the total charge of each ONIOM layer needs to be specified separately. QMmult = The multiplicity of the QMMM-grps. In case there are more than one QMMM-grps, the multiplicity of each ONIOM layer needs to be specified separately. CASorbitals = The number of orbitals to be included in the active space when doing a CASSCF computation. CASelectrons = The number of electrons to be included in the active space when doing a CASSCF computation. SH = no If this is set to yes, a QM/MM MD simulation on the excited state-potential energy surface and enforce a diabatic hop to the ground-state when the system hits the conical intersection hyperline in the course the simulation. This option only works in combination with the CASSCF method. 6.13.3 Output The energies and gradients computed in the QM calculation are added to those computed by GROMACS. In the .edr file there is a section for the total QM energy. 6.14. Using VMD plug-ins for trajectory file I/O 6.13.4 197 Future developments Several features are currently under development to increase the accuracy of the QM/MM interface. One useful feature is the use of delocalized MM charges in the QM computations. The most important benefit of using such smeared-out charges is that the Coulombic potential has a finite value at interatomic distances. In the point charge representation, the partially-charged MM atoms close to the QM region tend to “over-polarize” the QM system, which leads to artifacts in the calculation. What is needed as well is a transition state optimizer. 6.14 Using VMD plug-ins for trajectory file I/O GROMACS tools are able to use the plug-ins found in an existing installation of VMD in order to read and write trajectory files in formats that are not native to GROMACS. You will be able to supply an AMBER DCD-format trajectory filename directly to GROMACS tools, for example. This requires a VMD installation not older than version 1.8, that your system provides the dlopen function so that programs can determine at run time what plug-ins exist, and that you build shared libraries when building GROMACS. CMake will find the vmd executable in your path, and from it, or the environment variable VMDDIR at configuration or run time, locate the plug-ins. Alternatively, the VMD_PLUGIN_PATH can be used at run time to specify a path where these plug-ins can be found. Note that these plug-ins are in a binary format, and that format must match the architecture of the machine attempting to use them. 6.15 Interactive Molecular Dynamics GROMACS supports the interactive molecular dynamics (IMD) protocol as implemented by VMD to control a running simulation in NAMD. IMD allows to monitor a running GROMACS simulation from a VMD client. In addition, the user can interact with the simulation by pulling on atoms, residues or fragments with a mouse or a force-feedback device. Additional information about the GROMACS implementation and an exemplary GROMACS IMD system can be found on this homepage. 6.15.1 Simulation input preparation The GROMACS implementation allows transmission and interaction with a part of the running simulation only, e.g. in cases where no water molecules should be transmitted or pulled. The group is specified via the .mdp option IMD-group. When IMD-group is empty, the IMD protocol is disabled and cannot be enabled via the switches in mdrun. To interact with the entire system, IMD-group can be set to System. When using grompp, a .gro file to be used as VMD input is written out (-imd switch of grompp). 198 6.15.2 Chapter 6. Special Topics Starting the simulation Communication between VMD and GROMACS is achieved via TCP sockets and thus enables controlling an mdrun running locally or on a remote cluster. The port for the connection can be specified with the -imdport switch of mdrun, 8888 is the default. If a port number of 0 or smaller is provided, GROMACS automatically assigns a free port to use with IMD. Every N steps, the mdrun client receives the applied forces from VMD and sends the new positions to the client. VMD permits increasing or decreasing the communication frequency interactively. By default, the simulation starts and runs even if no IMD client is connected. This behavior is changed by the -imdwait switch of mdrun. After startup and whenever the client has disconnected, the integration stops until reconnection of the client. When the -imdterm switch is used, the simulation can be terminated by pressing the stop button in VMD. This is disabled by default. Finally, to allow interacting with the simulation (i.e. pulling from VMD) the -imdpull switch has to be used. Therefore, a simulation can only be monitored but not influenced from the VMD client when none of -imdwait, -imdterm or -imdpull are set. However, since the IMD protocol requires no authentication, it is not advisable to run simulations on a host directly reachable from an insecure environment. Secure shell forwarding of TCP can be used to connect to running simulations not directly reachable from the interacting host. Note that the IMD command line switches of mdrun are hidden by default and show up in the help text only with gmx mdrun -h -hidden. 6.15.3 Connecting from VMD In VMD, first the structure corresponding to the IMD group has to be loaded (File → New Molecule). Then the IMD connection window has to be used (Extensions → Simulation → IMD Connect (NAMD)). In the IMD connection window, hostname and port have to be specified and followed by pressing Connect. Detach Sim allows disconnecting without terminating the simulation, while Stop Sim ends the simulation on the next neighbor searching step (if allowed by -imdterm). The timestep transfer rate allows adjusting the communication frequency between simulation and IMD client. Setting the keep rate loads every N th frame into VMD instead of discarding them when a new one is received. The displayed energies are in SI units in contrast to energies displayed from NAMD simulations. 6.16 Embedding proteins into the membranes GROMACS is capable of inserting the protein into pre-equilibrated lipid bilayers with minimal perturbation of the lipids using the method, which was initially described as a ProtSqueeze technique,[162] and later implemented as g_membed tool.[163] Currently the functionality of g_membed is available in mdrun as described in the user guide. This method works by first artificially shrinking the protein in the xy-plane, then it removes lipids that overlap with that much smaller core. Then the protein atoms are gradually resized back to their initial configuration, using normal dynamics for the rest of the system, so the lipids adapt to 6.16. Embedding proteins into the membranes the protein. Further lipids are removed as required. 199 200 Chapter 6. Special Topics Chapter 7 Run parameters and Programs 7.1 Online documentation More documentation is available online from the GROMACS web site, http://manual. gromacs.org/documentation. In addition, we install standard UNIX man pages for all the programs. If you have sourced the GMXRC script in the GROMACS binary directory for your host they should already be present in your MANPATH environment variable, and you should be able to type e.g. man gmx-grompp. You can also use the -h flag on the command line (e.g. gmx grompp -h) to see the same information, as well as gmx help grompp. The list of all programs are available from gmx help. 7.2 File types Table 7.1 lists the file types used by GROMACS along with a short description, and you can find a more detail description for each file in your HTML reference, or in our online version. GROMACS files written in XDR format can be read on any architecture with GROMACS version 1.6 or later if the configuration script found the XDR libraries on your system. They should always be present on UNIX since they are necessary for NFS support. 7.3 Run Parameters The descriptions of .mdp parameters can be found at http://manual.gromacs.org/current/ mdp-options.html or in your installation at share/gromacs/html/mdp-options.html 202 Default Name Ext. atomtp.atp eiwit.brk state.cpt nnnice.dat user.dlg sam.edi sam.edo ener.edr ener.edr ener.ene eiwit.ent plot.eps conf.esp conf.g96 conf.gro conf.gro out.gro polar.hdb topinc.itp run.log ps.m2p ss.map ss.mat grompp.mdp hessian.mtx index.ndx hello.out eiwit.pdb residue.rtp doc.tex topol.top topol.tpr topol.tpr topol.tpr traj.trr traj.trr root.xpm traj.xtc traj.xtc traj.xtc graph.xvg Chapter 7. Run parameters and Programs Type Asc Asc xdr Asc Asc Asc Asc xdr Bin Asc Asc Asc Asc Asc Asc Asc Asc Asc Asc Asc Asc Bin Asc Asc Asc Asc Asc Asc xdr Default Option -f -f -c -c -c -c -o -l -f -m -n -o -f -o -p -s -s -s xdr Asc -f -f xdr Asc -o Description Atomtype file used by pdb2gmx Brookhaven data bank file Checkpoint file Generic data file Dialog Box data for ngmx ED sampling input ED sampling output Generic energy: edr ene Energy file in portable xdr format Energy file Entry in the protein date bank Encapsulated PostScript (tm) file Coordinate file in ESPResSo format Coordinate file in Gromos-96 format Coordinate file in Gromos-87 format Structure: gro g96 pdb esp tpr Structure: gro g96 pdb esp Hydrogen data base Include file for topology Log file Input file for mat2ps File that maps matrix data to colors Matrix Data file grompp input file with MD parameters Hessian matrix Index file Generic output file Protein data bank file Residue Type file used by pdb2gmx LaTeX file Topology file Generic run input: tpr Structure+mass(db): tpr gro g96 pdb Portable xdr run input file Full precision trajectory: trr cpt Trajectory in portable xdr format X PixMap compatible matrix file Trajec., input: xtc trr tng cpt gro g96 pdb Trajectory, output: xtc trr tng gro g96 pdb Compressed trajectory (portable xdr format) xvgr/xmgr file Table 7.1: The GROMACS file types. Chapter 8 Analysis In this chapter different ways of analyzing your trajectory are described. The names of the corresponding analysis programs are given. Specific information on the in- and output of these programs can be found in the online manual at www.gromacs.org. The output files are often produced as finished Grace/Xmgr graphs. First, in sec. 8.1, the group concept in analysis is explained. 8.1.2 explains a newer concept of dynamic selections, which is currently supported by a few tools. Then, the different analysis tools are presented. 8.1 Using Groups gmx make_ndx, gmx mk_angndx, gmx select In chapter 3, it was explained how groups of atoms can be used in mdrun (see sec. 3.3). In most analysis programs, groups of atoms must also be chosen. Most programs can generate several default index groups, but groups can always be read from an index file. Let’s consider the example of a simulation of a binary mixture of components A and B. When we want to calculate the radial distribution function (RDF) gAB (r) of A with respect to B, we have to calculate: 4πr2 gAB (r) = V NA X NB X P (r) (8.1) i∈A j∈B where V is the volume and P (r) is the probability of finding a B atom at distance r from an A atom. By having the user define the atom numbers for groups A and B in a simple file, we can calculate this gAB in the most general way, without having to make any assumptions in the RDF program about the type of particles. Groups can therefore consist of a series of atom numbers, but in some cases also of molecule numbers. It is also possible to specify a series of angles by triples of atom numbers, dihedrals by quadruples of atom numbers and bonds or vectors (in a molecule) by pairs of atom numbers. 204 Chapter 8. Analysis When appropriate the type of index file will be specified for the following analysis programs. To help creating such index files (index.ndx), there are a couple of programs to generate them, using either your input configuration or the topology. To generate an index file consisting of a series of atom numbers (as in the example of gAB ), use gmx make_ndx or gmx select. To generate an index file with angles or dihedrals, use gmx mk_angndx. Of course you can also make them by hand. The general format is presented here: [ Oxygen ] 1 4 [ Hydrogen ] 2 3 8 9 7 5 6 First, the group name is written between square brackets. The following atom numbers may be spread out over as many lines as you like. The atom numbering starts at 1. Each tool that can use groups will offer the available alternatives for the user to choose. That choice can be made with the number of the group, or its name. In fact, the first few letters of the group name will suffice if that will distinguish the group from all others. There are ways to use Unix shell features to choose group names on the command line, rather than interactively. Consult www.gromacs.org for suggestions. 8.1.1 Default Groups When no index file is supplied to analysis tools or grompp, a number of default groups are generated to choose from: System all atoms in the system Protein all protein atoms Protein-H protein atoms excluding hydrogens C-alpha Cα atoms Backbone protein backbone atoms; N, Cα and C MainChain protein main chain atoms: N, Cα , C and O, including oxygens in C-terminus MainChain+Cb protein main chain atoms including Cβ 8.1. Using Groups 205 MainChain+H protein main chain atoms including backbone amide hydrogens and hydrogens on the Nterminus SideChain protein side chain atoms; that is all atoms except N, Cα , C, O, backbone amide hydrogens, oxygens in C-terminus and hydrogens on the N-terminus SideChain-H protein side chain atoms excluding all hydrogens Prot-Masses protein atoms excluding dummy masses (as used in virtual site constructions of NH3 groups and tryptophan side-chains), see also sec. 5.2.2; this group is only included when it differs from the “Protein” group Non-Protein all non-protein atoms DNA all DNA atoms RNA all RNA atoms Water water molecules (names like SOL, WAT, HOH, etc.) See residuetypes.dat for a full listing non-Water anything not covered by the Water group Ion any name matching an Ion entry in residuetypes.dat Water_and_Ions combination of the Water and Ions groups molecule_name for all residues/molecules which are not recognized as protein, DNA, or RNA; one group per residue/molecule name is generated Other all atoms which are neither protein, DNA, nor RNA. Empty groups will not be generated. Most of the groups only contain protein atoms. An atom is considered a protein atom if its residue name is listed in the residuetypes.dat file and is listed as a “Protein” entry. The process for determinding DNA, RNA, etc. is analogous. If you need to modify these classifications, then you can copy the file from the library directory into your working directory and edit the local copy. 206 Chapter 8. Analysis 8.1.2 Selections gmx select Currently, a few analysis tools support an extended concept of (dynamic) selections. There are three main differences to traditional index groups: • The selections are specified as text instead of reading fixed atom indices from a file, using a syntax similar to VMD. The text can be entered interactively, provided on the command line, or from a file. • The selections are not restricted to atoms, but can also specify that the analysis is to be performed on, e.g., center-of-mass positions of a group of atoms. Some tools may not support selections that do not evaluate to single atoms, e.g., if they require information that is available only for single atoms, like atom names or types. • The selections can be dynamic, i.e., evaluate to different atoms for different trajectory frames. This allows analyzing only a subset of the system that satisfies some geometric criteria. As an example of a simple selection, resname ABC and within 2 of resname DEF selects all atoms in residues named ABC that are within 2 nm of any atom in a residue named DEF. Tools that accept selections can also use traditional index files similarly to older tools: it is possible to give an .ndx file to the tool, and directly select a group from the index file as a selection, either by group number or by group name. The index groups can also be used as a part of a more complicated selection. To get started, you can run gmx select with a single structure, and use the interactive prompt to try out different selections. The tool provides, among others, output options -on and -ofpdb to write out the selected atoms to an index file and to a .pdb file, respectively. This does not allow testing selections that evaluate to center-of-mass positions, but other selections can be tested and the result examined. The detailed syntax and the individual keywords that can be used in selections can be accessed by typing help in the interactive prompt of any selection-enabled tool, as well as with gmx help selections. The help is divided into subtopics that can be accessed with, e.g., help syntax / gmx help selections syntax. Some individual selection keywords have extended help as well, which can be accessed with, e.g., help keywords within. The interactive prompt does not currently provide much editing capabilities. If you need them, you can run the program under rlwrap. For tools that do not yet support the selection syntax, you can use gmx select -on to generate static index groups to pass to the tool. However, this only allows for a small subset (only the first bullet from the above list) of the flexibility that fully selection-aware tools offer. It is also possible to write your own analysis tools to take advantage of the flexibility of these selections: see the template.cpp file in the share/gromacs/template directory of your installation for an example. 8.2. Looking at your trajectory 207 Figure 8.1: The window of gmx view showing a box of water. 8.2 Looking at your trajectory gmx view Before analyzing your trajectory it is often informative to look at your trajectory first. GROMACS comes with a simple trajectory viewer gmx view; the advantage with this one is that it does not require OpenGL, which usually isn’t present on e.g. supercomputers. It is also possible to generate a hard-copy in Encapsulated Postscript format (see Fig. 8.1). If you want a faster and more fancy viewer there are several programs that can read the GROMACS trajectory formats – have a look at our homepage (www.gromacs.org) for updated links. 8.3 General properties gmx energy, gmx traj To analyze some or all energies and other properties, such as total pressure, pressure tensor, density, box-volume and box-sizes, use the program gmx energy. A choice can be made from a list a set of energies, like potential, kinetic or total energy, or individual contributions, like Lennard-Jones or dihedral energies. The center-of-mass velocity, defined as vcom = N 1 X m i vi M i=1 (8.2) with M = N i=1 mi the total mass of the system, can be monitored in time by the program gmx traj -com -ov. It is however recommended to remove the center-of-mass velocity every step (see chapter 3)! P 208 Chapter 8. Analysis e r+dr θ+dθ r θ r+dr r A B C D Figure 8.2: Definition of slices in gmx rdf: A. gAB (r). B. gAB (r, θ). The slices are colored gray. C. Normalization hρB ilocal . D. Normalization hρB ilocal, θ . Normalization volumes are colored gray. 8.4 Radial distribution functions gmx rdf The radial distribution function (RDF) or pair correlation function gAB (r) between particles of type A and B is defined in the following way: gAB (r) = = hρB (r)i hρB ilocal NA X NB 1 1 X δ(rij − r) hρB ilocal NA i∈A j∈B 4πr2 (8.3) with hρB (r)i the particle density of type B at a distance r around particles A, and hρB ilocal the particle density of type B averaged over all spheres around particles A with radius rmax (see Fig. 8.2C). Usually the value of rmax is half of the box length. The averaging is also performed in time. In practice the analysis program gmx rdf divides the system into spherical slices (from r to r + dr, see Fig. 8.2A) and makes a histogram in stead of the δ-function. An example of the RDF of oxygen-oxygen in SPC water [85] is given in Fig. 8.3. With gmx rdf it is also possible to calculate an angle dependent rdf gAB (r, θ), where the angle θ is defined with respect to a certain laboratory axis e, see Fig. 8.2B. gAB (r, θ) = cos(θij ) = NA X NB 1 1 X δ(rij − r)δ(θij − θ) hρB ilocal, θ NA i∈A j∈B 2πr2 sin(θ) rij · e krij k kek (8.4) (8.5) 8.4. Radial distribution functions 209 3 2.5 g(r) 2 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 r (nm) Figure 8.3: gOO (r) for Oxygen-Oxygen of SPC-water. 210 Chapter 8. Analysis This gAB (r, θ) is useful for analyzing anisotropic systems. Note that in this case the normalization hρB ilocal, θ is the average density in all angle slices from θ to θ+dθ up to rmax , so angle dependent, see Fig. 8.2D. 8.5 8.5.1 Correlation functions Theory of correlation functions The theory of correlation functions is well established [113]. We describe here the implementation of the various correlation function flavors in the GROMACS code. The definition of the autocorrelation function (ACF) Cf (t) for a property f (t) is: Cf (t) = hf (ξ)f (ξ + t)iξ (8.6) where the notation on the right hand side indicates averaging over ξ, i.e. over time origins. It is also possible to compute cross-correlation function from two properties f (t) and g(t): Cf g (t) = hf (ξ)g(ξ + t)iξ (8.7) however, in GROMACS there is no standard mechanism to do this (note: you can use the xmgr program to compute cross correlations). The integral of the correlation function over time is the correlation time τf : Z ∞ τf = 0 Cf (t)dt (8.8) In practice, correlation functions are calculated based on data points with discrete time intervals ∆t, so that the ACF from an MD simulation is: Cf (j∆t) = NX −1−j 1 f (i∆t)f ((i + j)∆t) N − j i=0 (8.9) where N is the number of available time frames for the calculation. The resulting ACF is obviously only available at time points with the same interval ∆t. Since, for many applications, it is necessary to know the short time behavior of the ACF (e.g. the first 10 ps) this often means that we have to save the data with intervals much shorter than the time scale of interest. Another implication of eqn. 8.9 is that in principle we can not compute all points of the ACF with the same accuracy, since we have N − 1 data points for Cf (∆t) but only 1 for Cf ((N − 1)∆t). However, if we decide to compute only an ACF of length M ∆t, where M ≤ N/2 we can compute all points with the same statistical accuracy: X 1 N −1−M Cf (j∆t) = f (i∆t)f ((i + j)∆t) (8.10) M i=0 Here of course j < M . M is sometimes referred to as the time lag of the correlation function. When we decide to do this, we intentionally do not use all the available points for very short time intervals (j << M ), but it makes it easier to interpret the results. Another aspect that may not be neglected when computing ACFs from simulation is that usually the time origins ξ (eqn. 8.6) are not statistically independent, which may introduce a bias in the results. This can be tested using a 8.5. Correlation functions 211 block-averaging procedure, where only time origins with a spacing at least the length of the time lag are included, e.g. using k time origins with spacing of M ∆t (where kM ≤ N ): X 1 k−1 f (iM ∆t)f ((iM + j)∆t) k i=0 Cf (j∆t) = (8.11) However, one needs very long simulations to get good accuracy this way, because there are many fewer points that contribute to the ACF. 8.5.2 Using FFT for computation of the ACF The computational cost for calculating an ACF according to eqn. 8.9 is proportional to N 2 , which is considerable. However, this can be improved by using fast Fourier transforms to do the convolution [113]. 8.5.3 Special forms of the ACF There are some important varieties on the ACF, e.g. the ACF of a vector p: Cp (t) = Z ∞ Pn (cos 6 (p(ξ), p(ξ + t)) dξ (8.12) 0 where Pn (x) is the nth order Legendre polynomial 1 . Such correlation times can actually be obtained experimentally using e.g. NMR or other relaxation experiments. GROMACS can compute correlations using the 1st and 2nd order Legendre polynomial (eqn. 8.12). This can also be used for rotational autocorrelation (gmx rotacf) and dipole autocorrelation (gmx dipoles). In order to study torsion angle dynamics, we define a dihedral autocorrelation function as [164]: C(t) = hcos(θ(τ ) − θ(τ + t))iτ (8.13) Note that this is not a product of two functions as is generally used for correlation functions, but it may be rewritten as the sum of two products: C(t) = hcos(θ(τ )) cos(θ(τ + t)) + sin(θ(τ )) sin(θ(τ + t))iτ 8.5.4 (8.14) Some Applications The program gmx velacc calculates the velocity autocorrelation function. Cv (τ ) = hv i (τ ) · v i (0)ii∈A (8.15) The self diffusion coefficient can be calculated using the Green-Kubo relation [113]: DA 1 1 = 3 Z ∞ P0 (x) = 1, P1 (x) = x, P2 (x) = (3x2 − 1)/2 0 hvi (t) · vi (0)ii∈A dt (8.16) 212 Chapter 8. Analysis which is just the integral of the velocity autocorrelation function. There is a widely-held belief that the velocity ACF converges faster than the mean square displacement (sec. 8.7), which can also be used for the computation of diffusion constants. However, Allen & Tildesley [113] warn us that the long-time contribution to the velocity ACF can not be ignored, so care must be taken. Another important quantity is the dipole correlation time. The dipole correlation function for particles of type A is calculated as follows by gmx dipoles: Cµ (τ ) = hµi (τ ) · µi (0)ii∈A (8.17) with µi = j∈i rj qj . The dipole correlation time can be computed using eqn. 8.8. For some applications see [165]. P The viscosity of a liquid can be related to the correlation time of the Pressure tensor P [166, 167]. gmx energy can compute the viscosity, but this is not very accurate [154], and actually the values do not converge. 8.6 Curve fitting in GROMACS 8.6.1 Sum of exponential functions Sometimes it is useful to fit a curve to an analytical function, for example in the case of autocorrelation functions with noisy tails. GROMACS is not a general purpose curve-fitting tool however and therefore GROMACS only supports a limited number of functions. Table 8.1 lists the available options with the corresponding command-line options. The underlying routines for fitting use the Levenberg-Marquardt algorithm as implemented in the lmfit package [168] (a bare-bones version of which is included in GROMACS in which an option for error-weighted fitting was implemented). Table 8.1: Overview of fitting functions supported in (most) analysis tools that compute autocorrelation functions. The “Note” column describes properties of the output parameters. Command Functional form f (t) Note line option exp e−t/a0 aexp a1 e−t/a0 exp_exp a1 e−t/a0 + (1 − a1 )e−t/a2 a2 ≥ a0 ≥ 0 exp5 a1 e−t/a0 + a3 e−t/a2 + a4 a2 ≥ a0 ≥ 0 exp7 a1 e−t/a0 + a3 e−t/a2 + a5 e−t/a4 + a6 a4 ≥ a2 ≥ a0 ≥ 0 −t/a 0 + a e−t/a2 + a e−t/a4 + a e−t/a6 + a exp9 a1 e a 8 6 ≥ a4 ≥ a2 ≥ a0 ≥ 0 3 5 7 8.6.2 Error estimation Under the hood GROMACS implements some more fitting functions, namely a function to estimate the error in time-correlated data due to Hess [154]: τ2 −t/τ2 τ1 −t/τ1 ε2 (t) = ατ1 1 + e − 1 + (1 − α)τ2 1 + e −1 (8.18) t t 8.6. Curve fitting in GROMACS 213 where τ1 and τ2 are time constants (with τ2 ≥ τ1 ) and α usually is close to 1 (in the fitting procedure it is enforced that 0 ≤ α ≤ 1). This is used in gmx analyze for error estimation using s 2(ατ1 + (1 − α)τ2 ) lim ε(t) = σ (8.19) t→∞ T where σ is the standard deviation of the data set and T is the total simulation time [154]. 8.6.3 Interphase boundary demarcation In order to determine the position and width of an interface, Steen-Sæthre et al. fitted a density profile to the following function f (x) = a0 + a1 a0 − a1 − erf 2 2 x − a2 a23 (8.20) where a0 and a1 are densities of different phases, x is the coordinate normal to the interface, a2 is the position of the interface and a3 is the width of the interface [169]. This is implemented in gmx densorder. 8.6.4 Transverse current autocorrelation function In order to establish the transverse current autocorrelation function (useful for computing viscosity [170]) the following function is fitted: f (x) = e−ν cosh(ων) + with ν = x/(2a0 ) and ω = 8.6.5 sinh(ων) ω (8.21) √ 1 − a1 . This is implemented in gmx tcaf. Viscosity estimation from pressure autocorrelation function The viscosity is a notoriously difficult property to extract from simulations [154, 171]. It is in principle possible to determine it by integrating the pressure autocorrelation function [166], however this is often hampered by the noisy tail of the ACF. A workaround to this is fitting the ACF to the following function [172]: βf f (t)/f (0) = (1 − C)cos(ωt)e−(t/τf ) βs + Ce−(t/τs ) (8.22) where ω is the frequency of rapid pressure oscillations (mainly due to bonded forces in molecular simulations), τf and βf are the time constant and exponent of fast relaxation in a stretchedexponential approximation, τs and βs are constants for slow relaxation and C is the pre-factor that determines the weight between fast and slow relaxation. After a fit, the integral of the function f (t) is used to compute the viscosity: η= V Z ∞ kB T 0 f (t)dt (8.23) This equation has been applied to computing the bulk and shear viscosity using different elements from the pressure tensor [173]. This is implemented in gmx viscosity. 214 Chapter 8. Analysis Mean Square Displacement -5 2 -1 D = 3.5027 (10 cm s ) 4000.0 -5 2 -1 MSD (10 cm s ) 3000.0 2000.0 1000.0 0.0 0.0 50.0 100.0 150.0 Time (ps) Figure 8.4: Mean Square Displacement of SPC-water. 8.7 Mean Square Displacement gmx msd To determine the self diffusion coefficient DA of particles of type A, one can use the Einstein relation [113]: lim hkri (t) − ri (0)k2 ii∈A = 6DA t (8.24) t→∞ This mean square displacement and DA are calculated by the program gmx msd. Normally an index file containing atom numbers is used and the MSD is averaged over these atoms. For molecules consisting of more than one atom, ri can be taken as the center of mass positions of the molecules. In that case, you should use an index file with molecule numbers. The results will be nearly identical to averaging over atoms, however. The gmx msd program can also be used for calculating diffusion in one or two dimensions. This is useful for studying lateral diffusion on interfaces. An example of the mean square displacement of SPC water is given in Fig. 8.4. 8.8 Bonds/distances, angles and dihedrals gmx distance, gmx angle, gmx gangle To monitor specific bonds in your modules, or more generally distances between points, the program gmx distance can calculate distances as a function of time, as well as the distribution of the distance. With a traditional index file, the groups should consist of pairs of atom numbers, for example: [ bonds_1 ] 1 2 3 4 8.8. Bonds/distances, angles and dihedrals 215 A φ=0 B φ=0 Figure 8.5: Dihedral conventions: A. “Biochemical convention”. B. “Polymer convention”. 9 10 [ bonds_2 ] 12 13 Selections are also supported, with first two positions defining the first distance, second pair of positions defining the second distance and so on. You can calculate the distances between CA and CB atoms in all your residues (assuming that every residue either has both atoms, or neither) using a selection such as: name CA CB The selections also allow more generic distances to be computed. For example, to compute the distances between centers of mass of two residues, you can use: com of resname AAA plus com of resname BBB The program gmx angle calculates the distribution of angles and dihedrals in time. It also gives the average angle or dihedral. The index file consists of triplets or quadruples of atom numbers: [ angles ] 1 2 2 3 3 4 3 4 5 [ dihedrals ] 1 2 3 2 3 5 4 5 For the dihedral angles you can use either the “biochemical convention” (φ = 0 ≡ cis) or “polymer convention” (φ = 0 ≡ trans), see Fig. 8.5. The program gmx gangle provides a selection-enabled version to compute angles. This tool can also compute angles and dihedrals, but does not support all the options of gmx angle, such as autocorrelation or other time series analyses. In addition, it supports angles between two vectors, a vector and a plane, two planes (defined by 2 or 3 points, respectively), a vector/plane and the 216 Chapter 8. Analysis z A B C D Figure 8.6: Angle options of gmx gangle: A. Angle between two vectors. B. Angle between two planes. C. Angle between a vector and the z axis. D. Angle between a vector and the normal of a sphere. Also other combinations are supported: planes and vectors can be used interchangeably. z axis, or a vector/plane and the normal of a sphere (determined by a single position). Also the angle between a vector/plane compared to its position in the first frame is supported. For planes, gmx gangle uses the normal vector perpendicular to the plane. See Fig. 8.6A, B, C) for the definitions. 8.9 Radius of gyration and distances gmx gyrate, gmx distance, gmx mindist, gmx mdmat, gmx pairdist, gmx xpm2ps To have a rough measure for the compactness of a structure, you can calculate the radius of gyration with the program gmx gyrate as follows: 2 i kri k mi P Rg = P !1 2 (8.25) i mi where mi is the mass of atom i and ri the position of atom i with respect to the center of mass of the molecule. It is especially useful to characterize polymer solutions and proteins. The program will also provide the radius of gyration around the coordinate axis (or, optionally, principal axes) by only summing the radii components orthogonal to each axis, for instance P Rg,x = i 2 + r2 ri,y i,z mi i mi P 12 (8.26) Sometimes it is interesting to plot the distance between two atoms, or the minimum distance between two groups of atoms (e.g.: protein side-chains in a salt bridge). To calculate these distances between certain groups there are several possibilities: • The distance between the geometrical centers of two groups can be calculated with the program gmx distance, as explained in sec. 8.8. • The minimum distance between two groups of atoms during time can be calculated with the program gmx mindist. It also calculates the number of contacts between these groups within a certain radius rmax . 8.10. Root mean square deviations in structure 217 90 80 t=0 ps 70 60 50 40 30 21 21 30 40 50 60 70 80 90 Residue Number 0 Distance (nm) 1.2 Figure 8.7: A minimum distance matrix for a peptide [174]. • gmx pairdist is a selection-enabled version of gmx mindist. • To monitor the minimum distances between amino acid residues within a (protein) molecule, you can use the program gmx mdmat. This minimum distance between two residues Ai and Aj is defined as the smallest distance between any pair of atoms (i ∈ Ai , j ∈ Aj ). The output is a symmetrical matrix of smallest distances between all residues. To visualize this matrix, you can use a program such as xv. If you want to view the axes and legend or if you want to print the matrix, you can convert it with xpm2ps into a Postscript picture, see Fig. 8.7. Plotting these matrices for different time-frames, one can analyze changes in the structure, and e.g. forming of salt bridges. 8.10 Root mean square deviations in structure gmx rms, gmx rmsdist The root mean square deviation (RM SD) of certain atoms in a molecule with respect to a reference structure can be calculated with the program gmx rms by least-square fitting the structure to the reference structure (t2 = 0) and subsequently calculating the RM SD (eqn. 8.27). " RM SD(t1 , t2 ) = N 1 X mi kri (t1 ) − ri (t2 )k2 M i=1 # 12 (8.27) where M = N i=1 mi and ri (t) is the position of atom i at time t. Note that fitting does not have to use the same atoms as the calculation of the RM SD; e.g. a protein is usually fitted on P 218 Chapter 8. Analysis the backbone atoms (N,Cα ,C), but the RM SD can be computed of the backbone or of the whole protein. Instead of comparing the structures to the initial structure at time t = 0 (so for example a crystal structure), one can also calculate eqn. 8.27 with a structure at time t2 = t1 − τ . This gives some insight in the mobility as a function of τ . A matrix can also be made with the RM SD as a function of t1 and t2 , which gives a nice graphical interpretation of a trajectory. If there are transitions in a trajectory, they will clearly show up in such a matrix. Alternatively the RM SD can be computed using a fit-free method with the program gmx rmsdist: 1 2 N X N 1 X 2 RM SD(t) = krij (t) − rij (0)k N 2 i=1 j=1 (8.28) where the distance rij between atoms at time t is compared with the distance between the same atoms at time 0. 8.11 Covariance analysis Covariance analysis, also called principal component analysis or essential dynamics [175], can find correlated motions. It uses the covariance matrix C of the atomic coordinates: 1 2 1 2 Cij = Mii (xi − hxi i)Mjj (xj − hxj i) (8.29) where M is a diagonal matrix containing the masses of the atoms (mass-weighted analysis) or the unit matrix (non-mass weighted analysis). C is a symmetric 3N × 3N matrix, which can be diagonalized with an orthonormal transformation matrix R: RT CR = diag(λ1 , λ2 , . . . , λ3N ) where λ1 ≥ λ2 ≥ . . . ≥ λ3N (8.30) The columns of R are the eigenvectors, also called principal or essential modes. R defines a transformation to a new coordinate system. The trajectory can be projected on the principal modes to give the principal components pi (t): 1 p(t) = RT M 2 (x(t) − hxi) (8.31) The eigenvalue λi is the mean square fluctuation of principal component i. The first few principal modes often describe collective, global motions in the system. The trajectory can be filtered along one (or more) principal modes. For one principal mode i this goes as follows: 1 xf (t) = hxi + M − 2 R∗i pi (t) (8.32) When the analysis is performed on a macromolecule, one often wants to remove the overall rotation and translation to look at the internal motion only. This can be achieved by least square fitting to a reference structure. Care has to be taken that the reference structure is representative for the ensemble, since the choice of reference structure influences the covariance matrix. 8.11. Covariance analysis 219 One should always check if the principal modes are well defined. If the first principal component resembles a half cosine and the second resembles a full cosine, you might be filtering noise (see below). A good way to check the relevance of the first few principal modes is to calculate the overlap of the sampling between the first and second half of the simulation. Note that this can only be done when the same reference structure is used for the two halves. A good measure for the overlap has been defined in [176]. The elements of the covariance matrix are proportional to the square of the displacement, so we need to take the square root of the matrix to examine the extent of sampling. The square root can be calculated from the eigenvalues λi and the eigenvectors, which are the columns of the rotation matrix R. For a symmetric and diagonallydominant matrix A of size 3N × 3N the square root can be calculated as: 1 1 1 1 2 A 2 = R diag(λ12 , λ22 , . . . , λ3N ) RT (8.33) It can be verified easily that the product of this matrix with itself gives A. Now we can define a difference d between covariance matrices A and B as follows: d(A, B) = = = s tr 1 1 A2 − B 2 2 r 1 (8.34) 1 tr A + B − 2A 2 B 2 (8.35) 1 N q N N X 2 2 X X λA + λB − 2 λA λB RA · RB i i i j i j (8.36) i=1 j=1 i=1 where tr is the trace of a matrix. We can now define the overlap s as: d(A, B) s(A, B) = 1 − √ trA + trB (8.37) The overlap is 1 if and only if matrices A and B are identical. It is 0 when the sampled subspaces are completely orthogonal. A commonly-used measure is the subspace overlap of the first few eigenvectors of covariance matrices. The overlap of the subspace spanned by m orthonormal vectors w1 , . . . , wm with a reference subspace spanned by n orthonormal vectors v1 , . . . , vn can be quantified as follows: overlap(v, w) = n X m 1X (vi · wj )2 n i=1 j=1 (8.38) The overlap will increase with increasing m and will be 1 when set v is a subspace of set w. The disadvantage of this method is that it does not take the eigenvalues into account. All eigenvectors are weighted equally, and when degenerate subspaces are present (equal eigenvalues), the calculated overlap will be too low. Another useful check is the cosine content. It has been proven that the the principal components of random diffusion are cosines with the number of periods equal to half the principal component index [177, 176]. The eigenvalues are proportional to the index to the power −2. The cosine content is defined as: 2 T Z T 0 iπt cos T !2 pi (t)dt Z T 0 !−1 p2i (t)dt (8.39) 220 Chapter 8. Analysis D α H r A Figure 8.8: Geometrical Hydrogen bond criterion. When the cosine content of the first few principal components is close to 1, the largest fluctuations are not connected with the potential, but with random diffusion. The covariance matrix is built and diagonalized by gmx covar. The principal components and overlap (and many more things) can be plotted and analyzed with gmx anaeig. The cosine content can be calculated with gmx analyze. 8.12 Dihedral principal component analysis gmx angle, gmx covar, gmx anaeig Principal component analysis can be performed in dihedral space [178] using GROMACS. You start by defining the dihedral angles of interest in an index file, either using gmx mk_angndx or otherwise. Then you use the gmx angle program with the -or flag to produce a new .trr file containing the cosine and sine of each dihedral angle in two coordinates, respectively. That is, in the .trr file you will have a series of numbers corresponding to: cos(φ1 ), sin(φ1 ), cos(φ2 ), sin(φ2 ), ..., cos(φn ), sin(φn ), and the array is padded with zeros, if necessary. Then you can use this .trr file as input for the gmx covar program and perform principal component analysis as usual. For this to work you will need to generate a reference file (.tpr, .gro, .pdb etc.) containing the same number of “atoms” as the new .trr file, that is for n dihedrals you need 2n/3 atoms (rounded up if not an integer number). You should use the -nofit option for gmx covar since the coordinates in the dummy reference file do not correspond in any way to the information in the .trr file. Analysis of the results is done using gmx anaeig. 8.13 Hydrogen bonds gmx hbond The program gmx hbond analyzes D and acceptors A. To determine if Fig. 8.8: r α the hydrogen bonds (H-bonds) between all possible donors an H-bond exists, a geometrical criterion is used, see also ≤ rHB = 0.35 nm ≤ αHB = 30o (8.40) The value of rHB = 0.35 nm corresponds to the first minimum of the RDF of SPC water (see also Fig. 8.3). The program gmx hbond analyzes all hydrogen bonds existing between two groups of atoms (which must be either identical or non-overlapping) or in specified donor-hydrogen-acceptor triplets, 8.13. Hydrogen bonds 221 D H (2) (1) O H A H (2) Figure 8.9: Insertion of water into an H-bond. (1) Normal H-bond between two residues. (2) H-bonding bridge via a water molecule. in the following ways: • Donor-Acceptor distance (r) distribution of all H-bonds • Hydrogen-Donor-Acceptor angle (α) distribution of all H-bonds • The total number of H-bonds in each time frame • The number of H-bonds in time between residues, divided into groups n-n+i where n and n+i stand for residue numbers and i goes from 0 to 6. The group for i = 6 also includes all H-bonds for i > 6. These groups include the n-n+3, n-n+4 and n-n+5 H-bonds, which provide a measure for the formation of α-helices or β-turns or strands. • The lifetime of the H-bonds is calculated from the average over all autocorrelation functions of the existence functions (either 0 or 1) of all H-bonds: C(τ ) = hsi (t) si (t + τ )i (8.41) with si (t) = {0, 1} for H-bond i at time t. The integral of C(τ ) gives a rough estimate of the average H-bond lifetime τHB : τHB = Z ∞ C(τ )dτ (8.42) 0 Both the integral and the complete autocorrelation function C(τ ) will be output, so that more sophisticated analysis (e.g. using multi-exponential fits) can be used to get better estimates for τHB . A more complete analysis is given in ref. [179]; one of the more fancy option is the Luzar and Chandler analysis of hydrogen bond kinetics [180, 181]. • An H-bond existence map can be generated of dimensions # H-bonds×# frames. The ordering is identical to the index file (see below), but reversed, meaning that the last triplet in the index file corresponds to the first row of the existence map. • Index groups are output containing the analyzed groups, all donor-hydrogen atom pairs and acceptor atoms in these groups, donor-hydrogen-acceptor triplets involved in hydrogen bonds between the analyzed groups and all solvent atoms involved in insertion. 222 8.14 Chapter 8. Analysis Protein-related items gmx do_dssp, gmx rama, gmx wheel To analyze structural changes of a protein, you can calculate the radius of gyration or the minimum residue distances over time (see sec. 8.9), or calculate the RMSD (sec. 8.10). You can also look at the changing of secondary structure elements during your run. For this, you can use the program gmx do_dssp, which is an interface for the commercial program DSSP [182]. For further information, see the DSSP manual. A typical output plot of gmx do_dssp is given in Fig. 8.10. One other important analysis of proteins is the so-called Ramachandran plot. This is the projection of the structure on the two dihedral angles φ and ψ of the protein backbone, see Fig. 8.11. To evaluate this Ramachandran plot you can use the program gmx rama. A typical output is given in Fig. 8.12. When studying α-helices it is useful to have a helical wheel projection of your peptide, to see whether a peptide is amphipathic. This can be done using the gmx wheel program. Two examples are plotted in Fig. 8.13. 8.15 Interface-related items gmx order, gmx density, gmx potential, gmx traj When simulating molecules with long carbon tails, it can be interesting to calculate their average orientation. There are several flavors of order parameters, most of which are related. The program gmx order can calculate order parameters using the equation: 3 1 Sz = hcos2 θz i − (8.43) 2 2 where θz is the angle between the z-axis of the simulation box and the molecular axis under consideration. The latter is defined as the vector from Cn−1 to Cn+1 . The parameters Sx and Sy are defined in the same way. The brackets imply averaging over time and molecules. Order parameters can vary between 1 (full order along the interface normal) and −1/2 (full order perpendicular to the normal), with a value of zero in the case of isotropic orientation. The program can do two things for you. It can calculate the order parameter for each CH2 segment separately, for any of three axes, or it can divide the box in slices and calculate the average value of the order parameter per segment in one slice. The first method gives an idea of the ordering of a molecule from head to tail, the second method gives an idea of the ordering as function of the box length. The electrostatic potential (ψ) across the interface can be computed from a trajectory by evaluating the double integral of the charge density (ρ(z)): ψ(z) − ψ(−∞) = − Z z −∞ dz 0 Z z0 −∞ ρ(z 00 )dz 00 /0 (8.44) where the position z = −∞ is far enough in the bulk phase such that the field is zero. With this method, it is possible to “split” the total potential into separate contributions from lipid and water Residue 8.15. Interface-related items 223 15 10 5 1 0 100 200 300 400 500 600 700 800 900 1000 Time (ps) Coil Bend Turn A-Helix B-Bridge Figure 8.10: Analysis of the secondary structure elements of a peptide in time. N H R C O ψ H Cα φ O N C H Figure 8.11: Definition of the dihedral angles φ and ψ of the protein backbone. 224 Chapter 8. Analysis Ramachandran Plot 180.0 120.0 Psi 60.0 0.0 –60.0 –120.0 –180.0 –180.0 –120.0 –60.0 0.0 Phi 60.0 120.0 180.0 18 OPR GL N- 21 GLU-2 5- Figure 8.12: Ramachandran plot of a small protein. 8 G LY -2 -17+ ARG PHE LYS-24+ HPr-A -22 HIS-15+ ALA AL A- 20 6 VAL-23 27 + THR-1 S- LY AL A -26 -1 9 Figure 8.13: Helical wheel projection of the N-terminal helix of HPr. 8.15. Interface-related items 225 molecules. The program gmx potential divides the box in slices and sums all charges of the atoms in each slice. It then integrates this charge density to give the electric field, which is in turn integrated to give the potential. Charge density, electric field, and potential are written to xvgr input files. The program gmx traj is a very simple analysis program. All it does is print the coordinates, velocities, or forces of selected atoms. It can also calculate the center of mass of one or more molecules and print the coordinates of the center of mass to three files. By itself, this is probably not a very useful analysis, but having the coordinates of selected molecules or atoms can be very handy for further analysis, not only in interfacial systems. The program gmx density calculates the mass density of groups and gives a plot of the density against a box axis. This is useful for looking at the distribution of groups or atoms across the interface. 226 Chapter 8. Analysis Appendix A Some implementation details In this chapter we will present some implementation details. This is far from complete, but we deemed it necessary to clarify some things that would otherwise be hard to understand. A.1 Single Sum Virial in GROMACS The virial Ξ can be written in full tensor form as: Ξ = − N 1 X r ij ⊗ F ij 2 i
Source Exif Data:File Type : PDF File Type Extension : pdf MIME Type : application/pdf PDF Version : 1.5 Linearized : No Page Count : 270 Page Mode : UseOutlines Author : The GROMACS team. Contributors: Emile Apol, Rossen Apostolov, Herman J.C. Berendsen, Aldert van Buuren, Pär Bjelkmar, Rudi van Drunen, Anton Feenstra, Sebastian Fritsch, Gerrit Groenhof, Christoph Junghans, Jochen Hub, Peter Kasson, Carsten Kutzner, Brad Lambeth, Per Larsson, Justin A. Lemkul, Viveca Lindahl, Magnus Lundborg, Erik Marklund, Pieter Meulenhoff, Teemu Murtola, Szilárd Páll, Sander Pronk, Roland Schulz, Michael Shirts, Alfons Sijbers, Peter Tieleman, Christian Wennberg and Maarten Wolf. Project leaders: Mark Abraham, Berk Hess, David van der Spoel, and Erik Lindahl. For more details see http://www.gromacs.org/About_Gromacs/People Title : GROMACS Reference Manual Subject : Theoretical and algorithmic background to molecular dynamics simulation using GROMACS Creator : LaTeX with hyperref package Producer : pdfTeX-1.40.16 Keywords : GROMACS;, molecular, dynamics;, molecular, simulation;, free, energy;, SIMD;, GPU;, GPGPU;, MPI;, OpenMP Create Date : 2018:08:23 14:53:19+02:00 Modify Date : 2018:08:23 14:53:19+02:00 Trapped : False PTEX Fullbanner : This is pdfTeX, Version 3.14159265-2.6-1.40.16 (TeX Live 2015/Debian) kpathsea version 6.2.1EXIF Metadata provided by EXIF.tools