SMFA Users Guide
User Manual:
Open the PDF directly: View PDF .
Page Count: 106
Download | |
Open PDF In Browser | View PDF |
SMFA Systematic Molecular Fragmentation by Annihilation Table of Contents 1. Purpose, Capabilities and Requirements 4 1.1 Purpose 4 1.2 Capabilities 5 1.3 Requirements 5 1.4 Installing SMFA on your computer 6 1.5 Acknowledgements 6 2. Running SMFA 7 2.1 Overview 7 2.2 Preliminaries 7 2.3 Getting started - main menu 8 2.4 Set up input 8 2.4.1 Submenu Item 1 Fragmentation 9 2.4.2 Submenu Item 2 Coordinate file 9 2.4.3 Submenu Item 3 Electronic structure 10 2.4.3. A GAMESS(US) 11 2.4.3. B GAUSSIAN 19 2.4.3. C NWChem 26 2.4.3. D Q-Chem 32 2.4.4 Submenu Item 4 Hydrogen Bonding and unusual valance 38 2.4.5 Submenu Item 5 Specify charges for metals & radicals 41 2.4.6 Submenu Item 6 Exit 42 2.5 Input review and preparation 43 2.6 Fragmentation 43 2.7 System variables 44 2.7.1 Sequential execution 45 2.7.2 Parallel execution 46 2.8 Electronic structure calculations 47 2.9 Restart electronic structure calculations 48 1 3. Output Guide 50 3.1 Checking the input 50 3.2 Fragmentation output 50 3.3 Output Energies 51 3.4 Output Gradients 51 3.5 Output Frequencies 52 3.6 Optimised Structures 52 3.7 Scans 53 4. Suggested Procedures 54 5. Utilities Guide 58 5.1 Frequencies with isotopic substitutions 58 5.2 Isodesmic, homodesmotic and analogous reactions 60 5.3 Combining isodesmic, homodesmotic etc reactions 63 5.4 Electrostatic potential on the solvent-accessible-surface 64 5.5 Dipole Polarizability 67 5.6 Dipole Hyperpolarizability 68 5.7 Internal Coordinates 68 5.8 Add H atoms 69 6. Write your own property 71 7. Test Cases/Examples 73 7.1 Relative energies of two protein conformers 73 7.2 Energy gradient 74 7.3 Geometry optimization 74 7.4 Frequencies 74 7.5 Scan 75 7.6 TS search 77 7.7 Isodesmic reactions 78 7.8 Combining isodesmic (etc) reactions 78 7.9 Electrostatic potential on the solvent accessible surface. 78 7.10 Dipole Polarizability 79 7.11 Dipole Hyperpolarizability 79 7.12 Internal coordinates 80 7.13 Adding H atoms 81 2 7.14 Input checks 84 8. Citation 86 Appendices 87 A. Installing the code 87 B. Parameter dtol 90 C. Structure optimization methods 92 D. Implementation of property and property gradient methods 94 E. Use of Perturbation theory 99 F. Additional implementation details 101 G. Quantum Chemistry Foibles 103 References 105 3 1. Purpose, Capabilities and Requirements 1.1 Purpose The purpose of the SMFA package is to provide the means to calculate the energy, structure and properties of moderate to large organic, organo-metallic, inorganic and biological molecules using modern electronic structure methods. The structure of both energy minima and saddle points ("transition states") can be determined. SMFA works by decomposing a molecule into relatively small molecular fragments. Ab initio quantum chemistry calculations (or DFT) on these fragments, combined with perturbation theory, are used to accurately estimate the structure, energy and properties of the whole molecule. The largest fragments typically contain about 4 - 7 chemical functional groups. The computational time required is determined by the time required to perform quantum chemistry calculations on these fragments. The number of fragments is proportional to the number of chemical functional groups in the molecule. If the user has just one computing unit available, then the total computer time required is linearly proportional to the size of the molecule. If the user has enough computing units available, the quantum chemistry calculations for the fragments can be performed in parallel, and the total computer time required is independent of the size of the molecule. In this way, the ultimate purpose of SMFA is to make accurate quantum chemistry calculations of large molecules feasible. 1.2 Capabilities The package can be used in conjunction with any one of the following quantum chemistry program packages: GAMESS (US), GAUSSIAN, NWChem and Q-Chem. The user can therefore employ any quantum chemistry method that is available in these program packages, for example, Hartree Fock, Móller Plesset perturbation theory, coupled cluster methods, or some flavour of density functional theory. At present, SMFA can calculate the • Energy, • Energy gradient, • Hessian and frequencies, • Minimum energy structures (with or without constraints), • Saddle point (transition state) structures (with or without constraints), • Scan of minimum energies on a specified path. • Various properties and isoenergetic reaction schemes 4 SMFA can evaluate the above properties with explicit solvent molecules included in the structure. Implicit solvent approximations are NOT presently available. Utility programs also provide the means to calculate various properties: (a) Infra-red spectra using a peak resolution specified by the user. (b) Infra-red spectra for molecules with isotopic substitutions. (c) The electrostatic potential on the solvent accessible surface. (d) a qualitative description of the molecule as a participant in isodesmic, homodesmotic, and corresponding higher order iso-energetic chemical reactions. (e) The dipole polarizability tensor. (f) The hyperpolarizability tensor. (g) Values of selected internal coordinates. (h) Add hydrogen atoms apparently missing from available structures. Importantly, Section 6 of this manual also contains instructions for how to obtain any property which is calculated by the quantum chemistry packages. Although SMFA has been applied to crystals1-3, crystal surfaces4-6, and fluids 7 under periodic boundary conditions, the periodic version of SMFA has NOT been implemented in this release. The methods incorporated in SMFA have been published previously. This manual does not describe the methodology of the fragmentation approach in detail; you will have to read the literature 8 1,2,7,9-15 , perhaps starting with some reviews16,17. The manual is intended solely as a guide for how to use SMFA. However, some additional implementation details (not previously published) are included in the Appendices of this manual. NOTE: You will read in the papers cited above that SMFA does NOT fragment molecules by breaking aromatic rings. So, SMFA will not reduce the computational task for an ab initio calculation on benzene, naphalene, graphenes, etc, and cannot be usefully applied to such systems. 1.3 Requirements (a) The user must have installed at least one of GAMESS (US), GAUSSIAN, NWChem or Q-Chem on their computer. (b) The quantum chemistry program DALTON is used by SMFA to provide some essential calculations that are not available on some or all of the 5 four packages above; so DALTON must be installed. (c) Some Fortran routines in SMFA employ routines from the LaPack library, so this library must be installed. (d) Perl must be installed. We note that GAMESS (US), NWChem and DALTON are all available free-of-charge under licence. 1.4 Installing SMFA on your computer See Appendix A. 1.5 Acknowledgements Part of the energy of a large molecule arises from interactions between segments of the molecule that are separated by large distances. In SMFA, the dispersion, electrostatic and induction contributions to these interactions are evaluated using perturbation theory. 18 When using GAMESS-US, GAUSSIAN or Q-Chem, the electrostatic interactions are evaluated (in large part) using charges, and higher order multipoles distributed on the atoms. GAMESS-US calculates these distributed multipoles, using its own version of the GDMA method which was developed by Anthony Stone. When using GAUSSIAN or Q-Chem, the SMFA program directly uses the GDMA program version 2.1. 19 The authors gratefully thank Professor Anthony Stone for the use of GDMA. The authors acknowledge the support of the Australian NCI National Facility where the program development was carried out. 6 2. Running SMFA 2.1 Overview SMFA is intended to be easy for the non-expert quantum chemist to use. However, in order to carry out any reliable quantum chemistry calculation, the user needs to specify an appropriate quantum chemistry method and a basis set for the electronic wavefunction or density. If you don't have even that much knowledge, then ask advice from someone who does. Someone who is familiar with using one of GAMESS (US), GAUSSIAN, NWChem or Q-Chem will be able to use SMFA easily. In addition to a method and basis set, you will need a molecule (possibly including solvent molecules to solvate it). In practice, you will need the Cartesian (xyz) coordinates stored in a file on your computer (details below). This structure might come from a crystal structure, NMR determined structure, or via a chemical structure drawing program (eg Spartan, Jmol, QMol). The structure can be optimised to that of an energy minimum or a saddle point by SMFA. SMFA stands for "Systematic Molecular Fragmentation by Annihilation". The method can be used with a systematic series of values of the input variable "Level". Larger values of Level decompose the molecule into larger fragments which require more computer time for the electronic structure calculations, but give a more accurate estimate of the molecular property. A sensible user runs SMFA with a secession of increasing values of Level to see convergence of the value of the property they are interested in. 2.2 Preliminaries During its operation, SMFA produces many intermediate files that have the same name for whichever molecule is being evaluated, so you should create a separate subdirectory for each molecule if you want to investigate more than one molecule at the same time. Put the file containing the Cartesian coordinates of the molecule into that subdirectory. The format of this file is that of a standard "xyz" format file. This is line 1: The number of atoms in the molecule (eq 472) line 2: a comment (eg the chemical/biological name of the molecule) or blank line line 3: The chemical element symbol and x, y, z coordinates of the first atom (Å) For example, C 2.634578 -0.012476 10.426925 and so on for each atom in the molecule. This is a free format read, except that the first 2 characters must contain the elemental symbol. SMFA requires correct elemental symbols. For example, magnesium is Mg not MG or mg. (Note: files from the Protein Data Base may not observe this rule) 7 If you have the molecular structure in some other format than xyz, then the OpenBabel program may provide the means to convert from that format to xyz format. NOTE: There are many (free) molecular graphics programs that accept xyz format files as input (eq iMol, MacMolPlt, IQMol, VMD) and produce a graphical representation of the molecule (eg ball and stick). It will prove very useful to have such a graphics program installed somewhere convenient. Some aspects of SMFA (eg optimisation subject to constraints) will require the user to know the number of some individual atoms in the sequence of input coordinates. Graphics will make this easy if you can look at a graphic of the molecule with the atom numbers shown. If you will have to specify constraints, then the "Internal coordinates utility" in the Utilities option (in the main menu) will help you find values of atom-atom distances, valance angles and dihedral angles in the input geometry. 2.3 Getting started We start SMFA by simply typing SMFA (RETURN). A menu appears: main menu options -> 1) 2) 3) 4) 5) 6) 7) 8) Set up input SMFA examines the input and prints comments or queries to OUT_SMFA Fragment the molecule Time limit, memory, etc Run all the electronic structure calculations Restart the electronic structure calculations Utilities Exit SMFA We use the arrow keys to move up and down the menu and the RETURN key to choose. Let's start with item 1 "Set up input" 2.4 Set up input Choosing 1) Set up input sends us to the place where we input quantities needed by SMFA and the quantum chemistry programs. This choice brings up another menu submenu: Input control -> 1) 2) 3) 4) 5) 6) Specify the Level of fragmentation Specify the cartesian coordinates file Specify the electronic structure calculation Specify hydrogen bonding and any unusual bonding (optional) Specify charges for metals & other atoms (optional) Exit input control We will now examine each of these parts of the required input. 8 NOTE: If you have completed any part of these submenu selections once, then you normally do not have to re-enter these selections if you want to change some other part of the input. 2.4.1 Submenu item 1 Specify the Level of fragmentation This choice brings two questions to answer Enter the Level of fragmentation The response can be 1 or 2 or 3 or .......... ("1" is possible but would not produce reliable results). This is the important parameter denoted as "Level" (see Section 4 of this Guide). A second question appears Enter the cutoff value for non-bonded interactions For fragmentation level = 2, the recommended value is 0. For fragmentation level > 2, the recommended value is 1.1 - see Users Guide We denote this parameter as dtol (see Appendix B). A typical response would be "1.1" (or a larger value for more accurate gradients (see Appendix B). If two functional groups in the molecule are not close in terms of bonded connections, and the shortest atom-atom distance between them is greater that dtol times the sum of the atom Van der Waals radii, then the interaction between these groups is evaluated using perturbation theory. NOTE: If Level = 2, it is recommended that you set dtol = 0. NOTE: If Level = 1, you MUST set dtol = 0. The program returns to the submenu. 2.4.2 Submenu item 2 Specify the cartesian coordinates file This choice brings a single question Name of the file containing the cartesian coordinates in xyz format - see Users Guide The response is the name of a file with "xyz" format in the current subdirectory. NOTE: In this file each atom has one line to specify its Cartesian coordinates, and the first 2 characters of this line must contain the elemental symbol for each atom (see Section 2.2). NOTE: If you have entered a new coordinate file name, then all the input information requested below must be re-entered. That is, any previous input in the submenu items below will have been deleted. The program returns to the submenu. NOTE: If you enter the name of a file that is NOT in the current directory, SMFA will respond with filename does not exist or is empty and the program will abort. 9 2.4.3 Submenu item 3 Specify the electronic structure calculation This choice brings up a series of questions. Many require a "Y" or "N" answer. SMFA is not case sensitive in such cases, so "y" or "n" will do. The first question asks the user to choose one of the four supported quantum chemistry program packages. What quantum chemistry program package will you use? The current choices are: 1 GAMESS US 2 Gaussian09 3 NWChem 4 Q-Chem Choose a number You enter an integer, 1, 2, 3, or 4. Depending on the choice, a series of questions appear that ask for the input required for that particular quantum chemistry program. Some responses will require a detailed knowledge of the format used by that program to specify the ab initio method and basis set. This section also asks the user to accept or reject the use of embedded charges as part of the means to model polar solvent molecules. We consider each quantum chemistry package in turn. 10 2.4.3. A GAMESS(US) Having chosen GAMESS, you are prompted: Enter the Energy Force Frequency Optimize Find TS Scan calculation type as an integer: (0) (1) (2) (3) (4) (5) You enter 0, 1, 2, 3, 4 or 5. This prompts a request to enter the relevant items in the $CONTRL section of GAMESS input: $CONTRL group variables Here you enter necessary parts of the $CONTRL group, ONE item per line For example SCFTYP=RHF MPLEVL=2 Omit the RUNTYP variable Omit the charge (ICHARG) and multiplicity (MULT) Omit $BASIS Omit COORD as COORD=UNIQUE is used by SMFA Omit the $DATA section The format MUST BE EXACTLY as GAMESS requires Begin now and end with a RETURN Much of the $CONTRL section is automated by SMFA, but you must enter the information that sets the ab initio quantum chemistry method. For example, you might enter SCFTYP=RHF if you want a restricted Hartree Fock calculation, or SCFTYP=RHF MPLEVL=2 if you want an MP2 calculation, or SCFTYP=RHF MPLEVL=B3LYP if you want a B3LYP calculation. Each $CONTRL variable is on a separate line, and a blank line (RETURN key only) terminates this response and elicits the next prompt: Here you enter other GAMESS variable groups (if any), eg $SYSTEM, $SCF, etc The format MUST BE EXACTLY as GAMESS requires (don't leave out compulsory spaces) Begin now and end with a RETURN Typical input might be 11 $SYSTEM MWORDS=4 $END or other bits of obscure jargon well known to GAMESS initiates. Note $SYSTEM above is prefaced by a single space. The blank line (RETURN only) elicits the next question : Does the chosen electronic structure method specified above account for electronic correlation leading to dispersion (see the user's manual)? Do you therefore want to account for dispersion at long range? Answer Y or N You enter Y or N. SMFA attempts to approximate the electronic energy that would be calculated for the whole molecule at the given level of theory. The exact molecular energy includes the dispersion interaction between parts of the molecule that are far apart. However, some electronic structure methods ignore this dispersion interaction, so SMFA also ignores it. For example, the Hartree-Fock method and many common DFT methods (eg B3LYP) ignore dispersion, so "N" would be the appropriate response in those cases. In contrast, for MP2, CCSD and "dispersion-corrected" DFT methods, "Y" would be the appropriate response. This response elicits the next question: GAMESS allows a different basis set for each chemical element Do you want all elements to have the same basis (Y or N) ? You enter Y or N. Generally, one uses a single choice of basis set for all atoms in the molecule, eg a Pople basis set like 6-31G(d,p), or a Dunning basis set like aug-cc-pvTZ, in which case the response is "Y". However, for example, if there are both metal atoms and first-row atoms in the molecule, one might want different basis sets for the metals and for the other atoms, in which case the response is "N". If the answer is "Y", this elicits the next prompt: Enter the common basis for all atoms You MUST follow the format required by GAMESS in 'card sequence U' of the $DATA group Enter one or more lines and finish with a (blank) RETURN You will need to know the precise format for GAMESS basis sets, as entered in 'card sequence U'. So, you will need the GAMESS user's manual. For example, for a Pople 6-31G basis set in 'card sequence U' you would enter N31 6 NOTE: It seems to be very difficult to find the correct format for more complicated basis sets than "N31 6". Probably, the simplest approach for GAMESS is to answer "N" to the question above (ie multiple basis sets), and proceed as detailed immediately below. If you want multiple basis sets, the answer is "N", and this elicits the next prompt: 12 To enter the basis for each chemical element, you MUST follow the format required by GAMESS in 'card sequence U' of the $DATA group and finish each basis set with a (blank) RETURN Enter the basis (check availability in the GAMESS library) for the following elements SMFA knows which elements are present in the molecule, and asks you enter the appropriate basis set after each element is shown. So, your screen might look like: H N21 3 C N31 6 O N31 6 For more complicated basis sets, a not-too-difficult approach is as follows. (i) Launch the EMSL basis set exchange website https://bse.pnl.gov/bse/portal (ii) Choose the basis set in the table on the left (ii) Choose Format: GAMESS-US (iii) Click the elements you want in the periodic table (iv) Click "Get Basis Set" (v) Cut and paste the basis set definition for each element from the EMSL output into the SMFA input after each element prompt. Do not include the element name in the EMSL output. Your screen then might look like 13 H S 1 2 3 S 1 P 1 3 S 1 2 3 4 5 6 L 1 2 3 L 1 D 1 6 3047.5249000 457.3695100 103.9486900 29.2101550 9.2866630 3.1639270 3 7.8682724 1.8812885 0.5442493 1 0.1687144 1 0.8000000 18.7311370 2.8253937 0.6401217 0.03349460 0.23472695 0.81375733 1 0.1612778 1.0000000 1.1000000 1.0000000 1 C 0.0018347 0.0140373 0.0688426 0.2321844 0.4679413 0.3623120 -0.1193324 -0.1608542 1.1434564 0.0689991 0.3164240 0.7443083 1.0000000 1.0000000 1.0000000 Do NOT include the $END line. SMFA will then go to the next prompt: SMFA often needs to calculate the static dipole polarizabilty of each functional group. SMFA uses the DALTON program, rather than GAMESS to do this. The 6-311+G(d,p) basis set is considered adequate for this purpose, and is the default. If you want to over-ride this default, you must enter a basis set in a format that is recognised by DALTON. Enter the chosen basis set, or hit the RETURN key to accept the default: NOTE: you need to use the DALTON format for basis functions here [eq 6-31G(d,p) is 6-31G** in DALTON parlance]. If you are using a single basis set for the energy calculation, and that basis is available in DALTON, then the appropriate choice here is to use the same basis for the polarizability. So, your response here might be 6-31G** This completes the input for GAMESS, unless you have chosen Optimize (3) or Find TS (4) or Scan (5) above. For Optimize (3) or Find TS (4), SMFA asks for additional information as follows: 14 The geometry will be optimised in a number of discrete steps (geometry changes). It is prudent to limit the number of steps employed to limit the total cpu time consumed. The optimisation can always be restarted from the last step reached. Enter the maximum number of steps allowed You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then asks: The geometry optimisation can be carried out with constraints on bond lengths, valence bond angles and dihedral angles, if requested. Do you want constraints? You enter "Y" or "N". If you enter "Y", then SMFA responds: Enter the number of bond lengths to be constrained You enter an integer, say 2 (or 0). If the number of bonds is greater than zero, SMFA responds: Enter the atom numbers and bond lengths for each constraint You enter data that looks like 12 63 1.09 63 79 1.53 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding bond length (to be constrained) is given in Angstrom. SMFA then responds: Enter the number of angles to be constrained You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 3 atom numbers and angle in degrees for each angle You enter data that looks like 12 63 79 104.5 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding angle is in degrees. The middle atom number (here 63) is at the vertex of the angle. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 4 atom numbers and dihedral angle in degrees for each angle You enter data that looks like 12 63 79 53 60.0 15 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding dihedral angle is in degrees. NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -π to π. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. SMFA then terminates the input for geometry optimisation. For Find TS (4), SMFA asks for additional information as follows: Many saddle points may exist on the potential energy surface of large molecules. To help identify the saddle point (TS) of interest, you must enter the atom numbers of a few atoms (say 3), that you expect to change position as the molecule passes through the TS. Enter the number of atoms you will denote as relevant to the TS You enter an integer. This might be (say) 3 for the atoms involved in the breaking and forming bonds at the TS, but can be any positive integer. SMFA then prompts: Enter each atom number, one per line You enter data that looks like 14 15 33 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the number of atoms equals the integer above. For Scan (5), SMFA asks for additional information as follows: The geometry will be optimised for a set of constraints. You must specify a set of configurations defined by a path, a sequence of constraints. These constraints apply to bond lengths, angles, and dihedral angles, which are kept fixed while all other degrees of freedom are optimised. See the Manual for details. Enter the number of geometries in the scan You enter an integer, say 10. SMFA then responds: 16 For each geometry in the scan, the unconstrained degrees of freedom will be optimised in a set of steps. You must specify the maximum number of such optimisation steps allowed. Enter the maximum number of steps. You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then responds: Enter the number of bond lengths to be constrained on the path You enter an integer (which may be 0 or more). If you enter 1 or more, SMFA then responds: Enter the atom numbers, initial bond length, and increment for each bond constraint You enter as many lines of input as the number of bond constraints specified above. Each line might look something like: 12 47 1.53 0.05 In this example, the bond between atoms 12 and 47 will be constrained to a length of 1.53 (Å) at the first geometry in the scan, 1.58 (Å) at the second geometry, and so on. A negative increment (eg -0.05) would decrease the bond length at each step on the scanned path. SMFA then responds: Enter the number of angles to be constrained You enter an integer (which may be 0 or more). If the number of angles is 1 or more, SMFA responds: Enter the 3 atom numbers, initial angle and increment in degrees for each angle For each constrained angle, you enter the 3 atom numbers that define the angle, the initial value of the angle on the path, and the increment for the angle (both in degrees). The integers n1, n2, n3 define the angle that the vector from n2 to n1 makes with the vector from n2 to n3. An input line might look like: 10 17 54 104.5 5.0 Angles are always positive numbers, increments can be positive or negative. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer (which may be 0 or more). If the number of dihedral angles is 1 or more, SMFA responds: Enter the 4 atom numbers, initial angle and increment in degrees for each dihedral For each constrained dihedral angle, you enter the 4 atom numbers that define the dihedral angle, the initial value of the dihedral angle on the path, and the increment for the dihedral angle (both in 17 degrees). The integers n1, n2, n3, n4 define the angle that the plane containing atoms n1, n2 and n3 makes with the plane containing atoms n2, n3 and n4. An input line might look like: 23 567 32 17 60.0 10.0 NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -180 to 180. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. This completes the input specific for GAMESS. SMFA returns to the Input control submenu. 18 2.4.3. B GAUSSIAN Having chosen GAUSSIAN09, you are asked: Enter the Energy Force Frequency Optimize Find TS Scan type of calculation as an integer: (0) (1) (2) (3) (4) (5) You enter 0, 1, 2, 3, 4 or 5. This prompts a request to enter the method used: Enter the calculation method [eg HF or MP2 or other] You enter simply HF or MP2, CCSD(T) or B3LYP or whatever method GAUSSIAN recognises. This prompts a request: Enter the %mem value, eg %mem=500mb, or hit RETURN for the GAUSSIAN default value on your system NOTE: You should enter a value for %mem that is appropriate for calculations using JUST 1 CPU on your system. It is recommended that you use a large value, consistent with the memory available to a single cpu. If you are happy with GAUSSIAN's default value, then simply hit the RETURN key. This prompts the question: Does this method account for electronic correlation leading to dispersion? (See the user's manual). Do you want to account for dispersion at long range? Answer Y or N You enter Y or N. SMFA attempts to approximate the electronic energy that would be calculated for the whole molecule at the given level of theory. The exact molecular energy includes the dispersion interaction between parts of the molecule that are far apart. However, some electronic structure methods ignore this dispersion interaction, so SMFA also ignores it. For example, the Hartree-Fock method and many common DFT methods (eg B3LYP) ignore dispersion, so "N" would be the appropriate response in those cases. In contrast, for MP2, CCSD and "dispersion-corrected" DFT methods, "Y" would be the appropriate response. This response elicits the next question: GAUSSIAN allows a different basis set for each chemical element. Do you want all elements to have the same basis (Y or N) ? You enter Y or N. Generally, one uses a single choice of basis set for all atoms in the molecule, eg a Pople basis set like 6-31G(d,p), or a Dunning basis set like aug-cc-pvTZ, in which case the response is "Y". However, for example, if there are both metal atoms and first-row atoms in the molecule, one might want different basis sets for the metals and for the other atoms, in which case the response is "N". 19 If the answer is "Y", this elicits the response: Enter the basis for all atoms You enter 6-31G(d,p) or cc-pVQZ or whatever basis GAUSSIAN recognises. If the answer is "N", SMFA knows which elements are present in the molecule, and asks you to enter the appropriate basis set after each element is shown. So, your screen might look like: Enter the basis (check availability in the GAUSSIAN library) for the following elements H 3-21G C 6-31G O 6-31G where you have entered each appropriate basis set after the prompt for each element. Completing the basis set specification elicits the response: Enter any other keywords (optional), or hit RETURN: The normal response might well be a blank line. There are many possibilities, such as MaxDisk=XXXMB for large post-Hartree-Fock calculations. NOTE: It may be useful to anticipate convergence difficulties with HF or DFT calculations, and so one could enter here (for example) SCXF=XQC Moreover, if you have programmed a means to extract some property from the output (see Section 6), you would enter the keyword for that property here. This response elicits the next question: Very polar solvent molecules can induce polarisation in both solute and other solvent molecules. SMFA can evaluate the energy of polar solvents and solutes, at a lower Level of Fragmentation, if embedded charges are used to describe the solvent environment. In order to identify the solvent, SMFA needs the chemical composition of the solvent. If your system contains polar solvent molecules, do you want to specify the use of embedded charges? (Y/N) You enter Y or N. The answer "Y" is recommended if polar solvent molecules, like H2O, are present in the input geometry, as SMFA will produce a more accurate estimate of the energy at the same cost in cpu time as "N". 20 If the answer is "Y", then SMFA will ask you some questions to help it identify the solvent molecules in the total structure. SMFA will ask you to Enter the number of atoms in the solvent molecule You enter an integer. In the case of H2O, you enter 3; for H2CO you enter 4, etc. This prompts SMFA to write You have to enter each element in the solvent. For example, for water you would enter H H O Now enter the elemental symbols for the atoms in the solvent: For H2CO, you would enter (the order is not important) H H C O This completes the input for GAUSSIAN, unless you have chosen Optimize (3) or Find TS (4) or Scan (5) above. For Optimize (3) or Find TS (4), SMFA asks for additional information as follows: The geometry will be optimised in a number of discrete steps (geometry changes). It is prudent to limit the number of steps employed to limit the total cpu time consumed. The optimisation can always be restarted from the last step reached. Enter the maximum number of steps allowed You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then asks: The geometry optimisation can be carried out with constraints on bond lengths, valence bond angles and dihedral angles, if requested. Do you want constraints? You enter "Y" or "N". If you enter "Y", then SMFA responds: Enter the number of bond lengths to be constrained You enter an integer, say 2 (or 0). If the number of bonds is greater than zero, SMFA responds: Enter the atom numbers and bond lengths for each constraint You enter data that looks like 12 63 1.09 63 79 1.53 21 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding bond length (to be constrained) is given in Angstrom. SMFA then responds: Enter the number of angles to be constrained You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 3 atom numbers and angle in degrees for each angle You enter data that looks like 12 63 79 104.5 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding angle is in degrees. The middle atom number (here 63) is at the vertex of the angle. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 4 atom numbers and dihedral angle in degrees for each angle You enter data that looks like 12 63 79 53 60.0 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding dihedral angle is in degrees. NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -180 to 180. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. SMFA then terminates the input for geometry optimisation. For Find TS (4), SMFA asks for additional information as follows: Many saddle points may exist on the potential energy surface of large molecules. To help identify the saddle point (TS) of interest, you must enter the atom numbers of a few atoms (say 3), that you expect to change position as the molecule passes through the TS. Enter the number of atoms you will denote as relevant to the TS 22 You enter an integer. This might be (say) 3 for the atoms involved in the breaking and forming bonds at the TS, but can be a larger integer. SMFA then prompts: Enter each atom number, one per line You enter data that looks like 14 15 33 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the number of atoms equals the integer above. For Scan (5), SMFA asks for additional information as follows: The geometry will be optimised for a set of constraints. You must specify a set of configurations defined by a path, a sequence of constraints. These constraints apply to bond lengths, angles, and dihedral angles, which are kept fixed while all other degrees of freedom are optimised. See the Manual for details. Enter the number of geometries in the scan You enter an integer, say 10. SMFA then responds: For each geometry in the scan, the unconstrained degrees of freedom will be optimised in a set of steps. You must specify the maximum number of such optimisation steps allowed. Enter the maximum number of steps You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then responds: Enter the number of bond lengths to be constrained on the path You enter an integer (which may be 0 or more). If you enter 1 or more. SMFA then responds: Enter the atom numbers, initial bond length, and increment for each bond constraint You enter as many lines of input as the number of bond constraints specified above. Each line might look something like: 12 47 1.53 0.05 23 In this example, the bond between atoms 12 and 47 will be constrained to a length of 1.53 (Å) at the first geometry in the scan, 1.58 (Å) at the second geometry, and so on. A negative increment (eg -0.05) would decrease the bond length at each step on the scanned path. SMFA then responds: Enter the number of angles to be constrained You enter an integer (which may be 0 or more). If the number of angles is 1 or more, SMFA responds: Enter the 3 atom numbers, initial angle and increment in degrees for each angle For each constrained angle, you enter the 3 atom numbers that define the angle, the initial value of the angle on the path, and the increment for the angle (both in degrees). The integers n1, n2, n3 define the angle that the vector from n2 to n1 makes with the vector from n2 to n3. An input line might look like: 10 17 54 104.5 5.0 Angles are always positive numbers, increments can be positive or negative. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer (which may be 0 or more). If the number of angles is 1 or more, SMFA responds: Enter the 4 atom numbers, initial angle and increment in degrees for each dihedral For each constrained dihedral angle, you enter the 4 atom numbers that define the dihedral angle, the initial value of the dihedral angle on the path, and the increment for the dihedral angle (both in degrees). The integers n1, n2, n3, n4 define the angle that the plane containing atoms n1, n2 and n3 makes with the plane containing atoms n2, n3 and n4. An input line might look like: 23 567 32 17 60.0 10.0 NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -180 to 180. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. SMFA returns to the Input control submenu. This now completes the input specific for GAUSSIAN. 24 2.4.3. C NWChem Having chosen NWChem, you are asked: Enter the Energy Force Frequency Optimize Find TS Scan type of calculation as an integer: (0) (1) (2) (3) (4) (5) You enter 0, 1, 2, 3, 4 or 5. This prompts a request to enter the input that NWChem needs to carry out the ab initio calculations: Enter each line of NWChem input needed for the tasks. For example, for an SCF, DTF, MP2 or TCE-based calculation. Include the TASK command Omit the charge, but include 'singlet' in the SCF or DFT section. Begin now and end with a RETURN Obviously, you will need to be familiar with NWChem commands and syntax. For example, for a simple Hartree Fock calculation of the energy you might enter SCF direct singlet END TASK SCF ENERGY For an MP2 calculation, you would replace TASK SCF ENERGY by TASK MP2 ENERGY. For mp2 frequencies, you might put "task mp2 freq" (it's not case sensitive). NOTE: The "TASK" must be consistent with the "type NOTE: If you have chosen Optimize of calculation" (3), Find TS (4) or Scan (5), entered above. then you must set TASK to OPTIMIZE. For example, TASK SCF OPTIMIZE or TASK MP2 OPTIMIZE, etc. NOTE: When you ask NWChem to calculate the frequencies or hessian (eg TASK SCF FREQ), this program does not automatically calculate the gradients, as do some other programs. In fact, SMFA is expecting to read both the gradients and the hessian, when the frequencies (hessian) are requested. Hence, when calculating the frequencies (hessian) with NWChem, you must put (for example, using SCF): TASK SCF GRADIENT TASK SCF FREQ 25 Aside: If more than the default memory is needed, you may need to include an allocation of memory here. By default, NWChem assigns the available memory with only 50% assigned to "global". Often one needs more global memory than stack or heap, so an assignment such as (for example) memory stack 100 heap 100 global 1300 mb might be appropriate. When you complete this entry, SMFA asks: Does the calculation method account for electronic correlation leading to dispersion? (See the user's manual). Do you want to account for dispersion at long range? Answer Y or N You enter Y or N. SMFA attempts to approximate the electronic energy that would be calculated for the whole molecule at the given level of theory. The exact molecular energy includes the dispersion interaction between parts of the molecule that are far apart. However, some electronic structure methods ignore this dispersion interaction, so SMFA also ignores it. For example, the Hartree-Fock method and many common DFT methods (eg B3LYP) ignore dispersion, so "N" would be the appropriate response in those cases. In contrast, for MP2, CCSD and "dispersion-corrected" DFT methods, "Y" would be the appropriate response. The next question is then: NWChem allows a different basis set for each element Do you want all elements to have the same basis (Y or N) ? You enter Y or N. Generally, one uses a single choice of basis set for all atoms in the molecule, eg a Pople basis set like 6-31G(d,p), or a Dunning basis set like aug-cc-pvTZ, in which case the response is "Y". However, for example, if there are both metal atoms and first-row atoms in the molecule, one might want different basis sets for the metals and for the other atoms, in which case the response is "N". If the answer is "Y", this elicits the response: Enter the basis for all atoms You enter 6-31G(d,p) or cc-pVQZ or whatever basis NWChem recognises. If the answer is "N", SMFA knows which elements are present in the molecule, and asks you to enter the appropriate basis set after each element is shown. So, your screen might look like: Enter the basis (check availability in the NWChem library) for the following elements H 3-21G C 6-31G 26 where you have entered each appropriate basis set after the prompt for each element. When the basis set input is complete, NWChem responds with: Very polar solvent molecules can induce polarisation in both solute and other solvent molecules. SMFA can evaluate the energy of polar solvents and solutes, at a lower Level of Fragmentation, if embedded charges are used to describe the solvent environment. In order to identify the solvent, SMFA needs the chemical composition of the solvent. If your system contains polar solvent molecules, do you want to specify the use of embedded charges? (Y/N) You enter Y or N. The answer "Y" is recommended if polar solvent molecules, like H2O, are present in the input geometry, as SMFA will produce a more accurate estimate of the energy at the same cost in cpu time as "N". If the answer is "Y", then SMFA will ask you some questions to help it identify the solvent molecules in the total structure. SMFA will ask you to Enter the number of atoms in the solvent molecule You enter an integer. In the case of H2O, you enter 3; for H2CO you enter 4. This prompts SMFA to write You have to enter each element in the solvent. For example, for water you would enter H H O Now enter the elemental symbols for the atoms in the solvent: For H2CO, you would enter (the order is not important) H H C O This completes the input for NWChem, unless you have chosen Optimize (3) or Find TS (4) or Scan (5) above. For Optimize (3) or Find TS (4), SMFA asks for additional information as follows: The geometry will be optimised in a number of discrete steps (geometry changes). It is prudent to limit the number of steps employed to limit the total cpu time consumed. The optimisation can always be restarted from the last step reached. Enter the maximum number of steps allowed 27 You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then asks: The geometry optimisation can be carried out with constraints on bond lengths, valence bond angles and dihedral angles, if requested. Do you want constraints? You enter "Y" or "N". If you enter "Y", then SMFA responds: Enter the number of bond lengths to be constrained You enter an integer, say 2 (or 0). If the number of bonds is greater than zero, SMFA responds: Enter the atom numbers and bond lengths for each constraint You enter data that looks like 12 63 1.09 63 79 1.53 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding bond length (to be constrained) is given in Angstrom. SMFA then responds: Enter the number of angles to be constrained You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 3 atom numbers and angle in degrees for each angle You enter data that looks like 12 63 79 104.5 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding angle is in degrees. The middle atom number (here 63) is at the vertex of the angle. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 4 atom numbers and dihedral angle in degrees for each angle You enter data that looks like 12 63 79 53 60.0 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding dihedral angle is in degrees. 28 NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -180 to 180. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. SMFA then terminates the input for geometry optimisation. For Find TS (4), SMFA asks for additional information as follows: Many saddle points may exist on the potential energy surface of large molecules. To help identify the saddle point (TS) of interest, you must enter the atom numbers of a few atoms (say 3), that you expect to change position as the molecule passes through the TS. Enter the number of atoms you will denote as relevant to the TS You enter an integer. This might be (say) 3 for the atoms involved in the breaking and forming bonds at the TS, but can be a larger integer. SMFA then prompts: Enter each atom number, one per line You enter data that looks like 14 15 33 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the number of atoms equals the integer above. For Scan (5), SMFA asks for additional information as follows: The geometry will be optimised for a set of constraints. You must specify a set of configurations defined by a path, a sequence of constraints. These constraints apply to bond lengths, angles, and dihedral angles, which are kept fixed while all other degrees of freedom are optimised. See the Manual for details. Enter the number of geometries in the scan You enter an integer, say 10. SMFA then responds: 29 For each geometry in the scan, the unconstrained degrees of freedom will be optimised in a set of steps. You must specify the maximum number of such optimisation steps allowed. Enter the maximum number of steps You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then responds: Enter the number of bond lengths to be constrained on the path You enter an integer (which may be 0 or more). If you enter 1 or more. SMFA then responds: Enter the atom numbers, initial bond length, and increment for each bond constraint You enter as many lines of input as the number of bond constraints specified above. Each line might look something like: 12 47 1.53 0.05 In this example, the bond between atoms 12 and 47 will be constrained to a length of 1.53 (Å) at the first geometry in the scan, 1.58 (Å) at the second geometry, and so on. A negative increment (eg -0.05) would decrease the bond length at each step on the scanned path. SMFA then responds: Enter the number of angles to be constrained You enter an integer (which may be 0 or more). If the number of angles is 1 or more, SMFA responds: Enter the 3 atom numbers, initial angle and increment in degrees for each angle For each constrained angle, you enter the 3 atom numbers that define the angle, the initial value of the angle on the path, and the increment for the angle (both in degrees). The integers n1, n2, n3 define the angle that the vector from n2 to n1 makes with the vector from n2 to n3. An input line might look like: 10 17 54 104.5 5.0 Angles are always positive numbers, increments can be positive or negative. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer (which may be 0 or more). If the number of angles is 1 or more, SMFA responds: Enter the 4 atom numbers, initial angle and increment in degrees for each dihedral For each constrained dihedral angle, you enter the 4 atom numbers that define the dihedral angle, the initial value of the dihedral angle on the path, and the increment for the dihedral angle (both in degrees). The integers n1, n2, n3, n4 define the angle that the plane containing atoms n1, n2 and n3 makes with the plane containing atoms n2, n3 and n4. An input line might look like: 30 23 567 32 17 60.0 10.0 NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -180 to 180. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. SMFA returns to the Input control submenu. This now completes the input for NWChem. 31 2.4.3. D Q-Chem NOTE: You must have Q-Chem 4.4 or later, preferably Q-Chem 5) Having chosen Q-Chem, you are asked: Enter values for the $rem Q-Chem input The JOBTYPE can be SP, FORCE, FREQ, OPT, TS or SCAN JOBTYPE You enter one of the allowed values for the JOBTYPE variable, for example JOBTYPE OPT SMFA then asks for the quantum chemistry method: METHOD and you make an appropriate response, such as METHOD MP2 SMFA then responds with: Enter any additional values for the $rem Q-Chem input Do not enter a BASIS value Begin now, and end with a RETURN In (most) simple cases, no additional $rem variables may be necessary. Note that if the JOBTYPE is TS or SCAN, this will appear as OPT in the OUT_SMFA output file (so don't worry). If you have programmed a means to extract some property from the output (see Section 6), you would enter the instruction for the calculation of that property here. NOTE: You don't enter the regular basis set here. However, for an RI-MP2 or RI-CC calculation, you must enter the auxiliary basis here, eg AUX_BASIS RIMP2-CC-PVDZ SMFA then asks: Does the calculation method account for electronic correlation leading to dispersion? (See the user's manual). Do you want to account for dispersion at long range? Answer Y or N You enter Y or N. SMFA attempts to approximate the electronic energy that would be calculated for the whole molecule at the given level of theory. The exact molecular energy includes the dispersion interaction between parts of the molecule that are far apart. However, some electronic structure methods ignore this dispersion interaction, so SMFA also ignores it. For example, the Hartree-Fock method and many common DFT methods (eg B3LYP) ignore dispersion, so "N" would be the appropriate response in those cases. In contrast, for MP2, CCSD and "dispersion-corrected" DFT methods, "Y" would be the appropriate response. 32 The next question is: Q-Chem allows a different basis set for each element Do you want all elements to have the same basis (Y or N) ? You answer Y or N. Generally, one uses a single choice of basis set for all atoms in the molecule, eg a Pople basis set like 6-31G(d,p), or a Dunning basis set like aug-cc-pvTZ, in which case the response is "Y". However, for example, if there are both metal atoms and first-row atoms in the molecule, one might want different basis sets for the metals and for the other atoms, in which case the response is "N". If you answer "Y", then SMFA responds: Enter the basis for all atoms You enter 6-31G(d,p) or cc-pVQZ or whatever basis Q-Chem recognises. If the answer is "N", SMFA knows which elements are present in the molecule, and asks you to enter the appropriate basis set after each element is shown. So, your screen might look like: Enter the basis (check availability in the Q-Chem library) for the following elements H 3-21G C 6-31G where you have entered each appropriate basis set after the prompt for each element. When the basis set input is complete, Q-Chem responds with: Very polar solvent molecules can induce polarisation in both solute and other solvent molecules. SMFA can evaluate the energy of polar solvents and solutes, at a lower Level of Fragmentation, if embedded charges are used to describe the solvent environment. In order to identify the solvent, SMFA needs the chemical composition of the solvent. If your system contains polar solvent molecules, do you want to specify the use of embedded charges? (Y/N) You enter Y or N. The answer "Y" is recommended if polar solvent molecules, like H2O, are present in the input geometry, as SMFA will produce a more accurate estimate of the energy at the same cost in cpu time as "N" (at a given value of Level). If the answer is "Y", then SMFA will ask you some questions to help it identify the solvent molecules in the total structure. SMFA will ask you to Enter the number of atoms in the solvent molecule You enter an integer. In the case of H2O, you enter 3; for H2CO you enter 4, etc. 33 This prompts SMFA to write You have to enter each element in the solvent. For example, for water you would enter H H O Now enter the elemental symbols for the atoms in the solvent: For H2CO, you would enter (the order is not important) H H C O If you specified JOBTYPE OPT or TS or SCAN above, then SMFA asks for additional information. For JOBTYPE OPT or TS, SMFA asks for additional information as follows: The geometry will be optimised in a number of discrete steps (geometry changes). It is prudent to limit the number of steps employed to limit the total cpu time consumed. The optimisation can always be restarted from the last step reached. Enter the maximum number of steps allowed You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then asks: The geometry optimisation can be carried out with constraints on bond lengths, valence bond angles and dihedral angles, if requested. Do you want constraints? You enter "Y" or "N". If you enter "Y", then SMFA responds: Enter the number of bond lengths to be constrained You enter an integer, say 2 (or 0). If the number of bonds is greater than zero, SMFA responds: Enter the atom numbers and bond lengths for each constraint You enter data that looks like 12 63 1.09 63 79 1.53 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding bond length (to be constrained) is given in Angstrom. SMFA then responds: Enter the number of angles to be constrained 34 You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 3 atom numbers and angle in degrees for each angle You enter data that looks like 12 63 79 104.5 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding angle is in degrees. The middle atom number (here 63) is at the vertex of the angle. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer, say 1 (or 0). If the integer is greater than zero, SMFA then responds: Enter the 4 atom numbers and dihedral angle in degrees for each angle You enter data that looks like 12 63 79 53 60.0 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the corresponding dihedral angle is in degrees. NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -180 to 180. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. SMFA then terminates the input for geometry optimisation. For JOBTYPE TS, SMFA asks for additional information as follows: Many saddle points may exist on the potential energy surface of large molecules. To help identify the saddle point (TS) of interest, you must enter the atom numbers of a few atoms (say 3), that you expect to change position as the molecule passes through the TS. Enter the number of atoms you will denote as relevant to the TS You enter an integer. This might be (say) 3 for the atoms involved in the breaking and forming bonds at the TS, but can be a larger integer. SMFA then prompts: Enter each atom number, one per line You enter data that looks like 35 14 15 33 where the atom numbers refer to the order of the atoms in the coordinate file (submenu item 2), and the number of atoms equals the integer above. For JOBTYPE SCAN, SMFA asks for additional information as follows: The geometry will be optimised for a set of constraints. You must specify a set of configurations defined by a path, a sequence of constraints. These constraints apply to bond lengths, angles, and dihedral angles, which are kept fixed while all other degrees of freedom are optimised. See the Manual for details. Enter the number of geometries in the scan You enter an integer, say 10. SMFA then responds: For each geometry in the scan, the unconstrained degrees of freedom will be optimised in a set of steps. You must specify the maximum number of such optimisation steps allowed. Enter the maximum number of steps You enter an integer for the maximum number of steps (eg 20 or 100 or whatever) SMFA then responds: Enter the number of bond lengths to be constrained on the path You enter an integer (which may be 0 or more). If you enter 1 or more. SMFA then responds: Enter the atom numbers, initial bond length, and increment for each bond constraint You enter as many lines of input as the number of bond constraints specified above. Each line might look something like: 12 47 1.53 0.05 In this example, the bond between atoms 12 and 47 will be constrained to a length of 1.53 (Å) at the first geometry in the scan, 1.58 (Å) at the second geometry, and so on. A negative increment (eg -0.05) would decrease the bond length at each step on the scanned path. SMFA then responds: Enter the number of angles to be constrained You enter an integer (which may be 0 or more). If the number of angles is 1 or more, SMFA responds: 36 Enter the 3 atom numbers, initial angle and increment in degrees for each angle For each constrained angle, you enter the 3 atom numbers that define the angle, the initial value of the angle on the path, and the increment for the angle (both in degrees). The integers n1, n2, n3 define the angle that the vector from n2 to n1 makes with the vector from n2 to n3. An input line might look like: 10 17 54 104.5 5.0 Angles are always positive numbers, increments can be positive or negative. SMFA then responds: Enter the number of dihedral angles to be constrained You enter an integer (which may be 0 or more). If the number of angles is 1 or more, SMFA responds: Enter the 4 atom numbers, initial angle and increment in degrees for each dihedral For each constrained dihedral angle, you enter the 4 atom numbers that define the dihedral angle, the initial value of the dihedral angle on the path, and the increment for the dihedral angle (both in degrees). The integers n1, n2, n3, n4 define the angle that the plane containing atoms n1, n2 and n3 makes with the plane containing atoms n2, n3 and n4. An input line might look like: 23 567 32 17 60.0 10.0 NOTE: Definitions of dihedral angles differ in sign and magnitude; here, dihedrals are taken in the range, -180 to 180. The utility program Internal Coordinates (see Section 5), reports bond lengths, bond angles and dihedral angles for the current molecular coordinates. See Appendix C for the definition of the dihedral angle. Following any optimization/constraint questions, SMFA returns to the Input control submenu. This completes the input for Q-Chem. 37 2.4.4 Submenu item 4 Specify hydrogen bonding and any unusual bonding SMFA fragments a molecule according to the bonded connections between atoms. An algorithm decides what is a single bond, and what is a multiple bond (that is not broken in a fragmentation). However, SMFA allows the user to over-ride some features of the standard algorithm. For example, the input structure might describe some "snapshot" of a chemical reaction, where bonds are breaking and forming. The user can demand that some atoms be considered as bonded, even though the atom-atom distance may be much larger than a normal bond. In this way, the same fragmentation will be applied to a sequence of "snapshots" of the reaction, giving a "smooth" energy profile for the process. Hence, if you are using the SCAN facility to calculate the energy on a path where a bond is stretched beyond "breaking point", you might want to specify this bond to exist no matter what its length. Similarly, SMFA treats hydrogen bonds like single bonds by default, but you can override this if desired. If you are happy with all the default settings regarding bonding, then you can skip this item. If not, select this item, and the following questions will appear: Question 1 If you want to accept all SMFA defaults regarding bond definitions, simply hit the RETURN key to skip this input. Do you want to specify some bonds as multiple bonds, that would normally be taken as single bonds (Y/N or hit RET to skip all)? Enter Y or N and proceed to the next question, or hit the RETURN key to skip this entire submenu item. Multiple bonds are never broken in the fragmentation process. If you enter Y, this prompts another question: Enter the number of (unusual) user-specified double bonds (usually 0) The response is an integer. An integer greater than 0 prompts Enter the atom numbers for these double bonded atoms (eg 27 243) The atom numbers refer to the order of the atoms in the coordinate file (submenu item 2). You must enter as many pairs of atom numbers (one pair per line) as the integer entered above. 38 Question 2 Do you want to specify some bonds as single bonds, that would normally be taken as multiple bonds (Y/N)? Enter Y or N Y prompts: Enter the number of (unusual) user-specified single bonds (usually 0) The response is an integer. An integer greater than 0 prompts Enter the atom numbers for these single bonded atoms (eg 86 97) The atom numbers refer to the order of the atoms in the coordinate file (item 2). You must enter as many pairs of atom numbers (one pair per line) as the integer entered above. Question 3 Do you want to specify that single bonds exist that would normally not be considered as bonds (Y/N)? Y prompts: Enter the number of (unusual) user-specfied extra bonds (usually 0) The response is an integer. An integer greater than 0 prompts Enter the atom numbers for the atoms connected by these extra bonds (eg 147 15) The atom numbers refer to the order of the atoms in the coordinate file (item 2). You must enter as many pairs of atom numbers (one pair per line) as the integer entered above. Question 4 By default SMFA assumes that the amide moiety is treated as a single group Do you want to treat the Amide CN bond as a single bond only (Y/N)? Enter Y or N. If you enter "Y", SMFA will produce slightly smaller fragments in molecules that contain amide groups (eg proteins) and hence reduce the cpu time, but at the cost of lower accuracy. If you choose "Y", when studying a protein, you must be careful to observe convergence with higher than normal values of the parameter Level. Question 5 By default SMFA assumes that CX2 or CX3 (X=F,Cl,..) is a single group. Do you want to treat these CX bonds as single bonds only (Y/N)? Enter Y or N. If you enter "Y", SMFA will produce slightly smaller fragments in molecules that contain such groups and hence reduce the cpu time, but at the cost of lower accuracy. Question 6 By default SMFA assumes that hydrogen bonds will be treated as regular bonds 39 Do you want to ignore hydrogen bonds (Y/N)? Enter Y or N. If you enter "Y", SMFA will produce slightly smaller fragments in molecules that contain hydrogen bonds and hence reduce the cpu time, but at the cost of lower accuracy. Question 7 By default SMFA assumes that if two charged groups are close together then they should be considered to be bonded Do you want to ignore this default (Y/N)? Enter Y or N. If you enter "Y", SMFA will produce slightly smaller fragments in molecules that contain such "adjacent" charged groups and hence reduce the cpu time, but at the cost of lower accuracy. The idea is that the two parts of something like a "salt bridge" should be consider as bonded. If you enter "N", this prompts a further question Two charged groups are considered close together if the distance between them is less than the sum of the Van der Waals radii of the closest atoms multiplied by a factor Enter the value of this factor (the recommended value is 1.5) Enter a value. The program returns to the submenu. 40 2.4.5 Submenu item 5 Specify charges for metals & other atoms (optional) SMFA determines whether or not a chemical functional group is neutral or charged based on the valency of the chemical elements (accounting for alternative possibilities, eg Phosphorus might have a valency of 3 or 5). However, SMFA cannot know, a priori, what is the charge of a metal atom (eg Fe2+ or Fe3+ ?). If the molecule contains metal atoms, the user must specify the formal charge on each metal. If you forget to do this, SMFA will identify such "unspecified" metals, and query this in the output file OUT_SMFA, when the input is checked. You will be asked to modify any default charges assigned by SMFA. In addition, the valency of some atoms may be unusual in some circumstances, for example if the molecule is a radical, or the geometry is very perturbed, as in a transition state. For example, a reaction may involve abstracting a hydrogen atom from a carbon atom, leaving a group like this in the molecule. It is impossible to know a priori if such a group (eg CH3) is an anion, a cation, or neutral. The user must specify the charge of such an atom (the carbon, in this case). NOTE: This is likely to be an issue during a scan or in the search for a transition state. If you select this item, SMFA asks a series of questions. Question 1 Are there metals or other atoms whose charge must be specified or modified (Y/N, or hit RETURN to skip)? Enter Y or N (and proceed to the next question, or hit the RETURN key to skip this question. NOTE: if you have previously specified some metal charges, or if SMFA has assigned default charges to some metals, AND you hit the return key, then the earlier assigned charges will be implemented. "Y" prompts Question 2 Enter the number of metal atoms The response is an integer. An integer greater than 0, prompts Question 3 Enter the atom number(s) and associated charge(s), eq 34 2 (ie the 34th atom has a charge of 2) Enter each metal (or unusual) atom number and charge, one atom per line, as many atoms as entered above. That is, you enter lines that look like 63 2 103 0 The atom numbers refer to the order of the atoms in the coordinate file (submenu item 2) and the number of lines corresponds to the integer entered above. In the example above, "103 0" might signify 41 that atom number 103 is a carbon radical, while "63 2" might signify that an Fe atom at position 63 in the input file is Fe2+. 2.4.6 Submenu item 6 Exit input control Self explanatory. In general, once you have answered all the questions in one of the submenu items once, you do not need to repeat that submenu item again, if you want to make a change elsewhere in the input. The exception is that if you change the molecule coordinate file, the earlier input is removed, and you must complete the input again. 42 2.5 Input review and preparation Main menu item 2 is 2) SMFA examines the input and prints comments or queries to OUT_SMFA It is ESSENTIAL to choose this item after completing the input or if the input has been changed in any way. Having read the input parameters, SMFA prepares certain files needed by the fragmentation process, including a description of the bonding between atoms and the formal charges of functional groups. The output of this process and a record of the values of input parameters are printed to the main output file, OUT_SMFA. You should look at OUT_SMFA to check for mistakes in the input. If the input is unsatisfactory, then you return to the input menu and adjust the input values. Else, you go to item 3 in the main menu. See Section 3.1 for a guide to the output in OUT_SMFA. 2.6 Fragmentation Item 3 in the main menu is 3) Fragment the molecule. It is ESSENTIAL to choose this item before proceeding to the electronic structure calculation. The process is automated, no input is required except a RETURN key when the process is completed. The theoretical background for the fragmentation is described in the references. The coordinates for the fragments produced at the requested value of Level are written to the file seefrags. This file is in a standard xyz format which can be read by many molecular graphics programs (eg MacMolPlt or VMD). A brief summary of the number of fragments, and the number of electrons in the largest fragment is appended to OUT_SMFA. SMFA also inspects the fragmentation output and decides what would be an optimum number of cpus to use to run the ab initio calculations of the fragments in parallel (in order to minimise the "walltime" for the calculation). The recommended number of cpus is written to OUT_SMFA. The output file, OUT_SMFA, also provides information on the largest fragment produced by the fragmentation process. The size of this molecule determines the degree of computational difficulty for the ab initio calculations. The user should consider what computational resources (memory, scratch disk space, time, etc) would be necessary for the desired basis set and level of ab initio theory on this largest fragment. This will determine your choices for some "system variables" in the next section. An input file for the ab initio calculation for this largest fragment is contained in a file named LARGE.com (or LARGE.inp or LARGE.nw, depending on the quantum chemistry package employed). If the fragment in LARGE.com is indeed very large, you might need to run some trial calculations with 43 LARGE.com to see what memory, disk space, time etc is needed, and if you need to include additional items in the ab initio program input (Section 2.4), such as memory requirements. 2.7 System variables Item 4 in the main menu is 4) Time limit, memory, queue etc Here you are asked to create a, or modify an existing, submission script for the execution of the ab initio calculations and other functions of SMFA. This code assumes that a PBS (https://en.wikipedia.org/wiki/Portable_Batch_System) controls the execution of all jobs and queues. When you choose this item, SMFA prints A file called pbsfile must contain the instructions for the PBS that controls job queues, memory requirements, time limits, the number of cpus requested, etc. This file also loads the quantum chemistry program packages. If a pbsfile exists in the current directory, SMFA uses this file. Otherwise, a pbsfile has been saved in the SMFAPAC/bin subdirectory of the main SMFA directory, denoted pbsfile_standard, and SMFA uses this file. If a pbsfile exists in the current directory, SMFA prints The current file of PBS instructions contains the following: followed by the current contents of the pbsfile, for example #!/bin/bash #PBS -P a00 #PBS -N gau #PBS -l mem=2gb #PBS -l ncpus=1 #PBS -l jobfs=2gb #PBS -l walltime=1:00:00 #PBS -l wd #PBS -q normal # enter instructions that make the quantum chemistry package available (do not change this line) # and finish with a blank line (do not change this line) module load Q-Chem/4.3 # enter instructions that make the DALTON package available # and finish with a blank line module load dalton (do not change this line) (do not change this line) Otherwise, SMFA prints The standard file of PBS instructions contains the following: NOTE that the standard file refers to a fictitious project xxx 44 followed by the contents of a file similar to that above. SMFA then provides a means to edit the pbsfile. You are prompted as follows: To To To To add a new line, Enter A change any line in the PBS instructions, Enter C remove any line in the PBS instructions, Enter R terminate editing this file, simply hit the RETURN key The A, C and R above are not case sensitive. Following these and successive prompts, allows you to edit the file. When editing is complete, SMFA saves the modified file in the current directory. Alternatively, you can copy pbsfile_standard in the SMFAPAC/bin/ subdirectory into the current directory for this molecule as pbsfile, and use any text editor to directly edit pbsfile. The ab initio calculations on all the fragments can be executed sequentially or in parallel. The choice of sequential versus parallel is made in the next Section/Menu item. NOTE: In addition to resource requirements (memory, disk space, time limit), pbsfile contains a directive for which queue the calculation is submitted to. PLEASE NOTE: You must enter the instructions for loading the quantum chemistry packages as in # enter instructions that make the quantum chemistry package available (do not change this line) # and finish with a blank line (do not change this line) module load Q-Chem/4.3 # enter instructions that make the DALTON package available # and finish with a blank line module load dalton (do not change this line) (do not change this line) The instruction regarding "do not change this line " is important because SMFA uses these lines to locate the instructions for loading the quantum chemistry packages. The instruction to "load" or otherwise make available a quantum chemistry package, eg module load Q-Chem/4.3, may occupy one or more lines, but you must finish with a blank line (RETURN only, no spaces). Unless you want to change the quantum chemistry package, you need only enter these instructions once. PLEASE NOTE: SMFA uses the line #PBS -l ncpus=1 to know how many processors are requested, so this line is ESSENTIAL. 2.7.1 Sequential execution If you plan to run the calculations sequentially, the core memory and scratch space memory should be set as appropriate for the ab initio calculation on the largest fragment. The time limit 45 should be set to allow all calculations to complete (carried out one after the other). The number of atoms and electrons in this largest fragment were reported in the OUT_SMFA file after the fragmentation has been completed. A sample input file, LARGE.com, for this fragment was also created. You MUST set #PBS -l ncpus=1 in this case (see previous section) . Generally, even the largest fragment is too small to benefit significantly from the use of multiple processors. 2.7.2 Parallel execution If you plan to run the calculations in parallel, then you must set the number of cpus here (call it ncpus), say #PBS -l ncpus=100 This can be the number of cpus recommended in the OUT_SMFA file or more or less. You can specify more cpus, if that is convenient for your system. Then (to be safe) you should set the memory and scratch disk space to be appropriate for ncpus calculations, each as large as that in LARGE.com, running independently. The time limit should be set larger than the time required for one calculation on LARGE.com (as there are a lot of other things for SMFA to do). Note again that you must include instructions in pbsfile to load the relevant ab initio quantum chemistry package. This package may be one of GAMESS(US), GAUSSIAN, NWChem or Q-Chem. In addition, the DALTON package must always be loaded. 46 2.8 Electronic Structure Calculations Item 5 in the main menu is 5) Run all the electronic structure calculations. When you choose this item, the following submenu appears: submenu: Run Options 1) 2) 3) 4) Run all calculations sequentially Run all calculations in parallel Run all calculations in parallel with multiple nodes Exit run options You choose one of these options. Choice (1) performs all the (very many) electronic structure calculations for the fragments sequentially on a single cpu. Choice (2) performs the fragment calculations, each fragment on a single cpu, using multiple cpus on a single node. The number of cpus employed was set in main menu item 4. Choice (3) is the same as choice (2), but where the number of cpus exceeds that on a single node. Many systems require special methods to submit calculations on multiple nodes. Once a choice is made, all ab initio or DFT calculations are prepared and submitted for execution. The process is automatic. The SMFA program will terminate with the message that (when the ab initio calculations are complete) the output will be appended to OUT_SMFA. NOTE: The first time that you choose option (3) in the current directory, SMFA responds with Before you run a multinode calculation in this directory for the first time, you must edit the MULTINODEDATA file in this directory, as directed by the notes in that file. Alternatively, you can copy a previously edited file from another directory into this directory. Hit RETURN to exit SMFA will terminate so that you can edit this file. The standard MULTINODEDATA file for multinode operation uses the "pbsdsh" command (a standard MULTINODEDATA is automatically copied from SMFAPAC/bin into the current directory when you choose option 3, unless this file already exists there). Follow the instructions in the standard MULTINODEDATA file. You only have to go through this editing procedure once in the current directory. To avoid a repeat of this task in new directories, you can copy a previously edited file into the new directory. NOTE: The MULTINODEDATA file uses the "pbsdsh" command. Unfortunately, it appears that pbsdsh is incompatible with the normal MPI version of GAMESS. In order to use multiple nodes, GAMESS users using pbsdsh to distribute tasks to nodes will need to have GAMESS set up to run in serial. 47 2.9 Restart electronic structure calculations The multitude of electronic structure calculations may have been aborted prematurely for a number of reasons: For example, a hardware fault occurred; or insufficient memory or disk space was requested, causing the quantum chemistry program to abort; or the user chose to abort the process for some reason. Many individual calculations may have been executed successfully before the job aborted, and the user may choose to restart the calculations, beginning with the first incomplete task in the sequence. The restart facility DOES NOT APPLY to geometry optimisation or TS search or a scan, but only to calculation of the energy, gradient or frequency (and any properties requested). For optimisation etc, you should store any optimised energies and geometries (see Section 3.6), and re-run the optimisation or scan using the last 'Newcoords.xyz' geometry as the starting geometry (under a new file name). You CANNOT restart the quantum chemistry calculations in SMFA, if you have changed (or want to change) anything related to the molecular structure or bonding, or the fragmentation parameters, or the electronic structure method. You must change the input and go through the usual sequence of steps to run the electronic structure calculations. If the partially complete job was running in sequential mode, then you MUST restart in sequential mode; if originally in parallel mode, then you MUST restart in parallel mode. To restart the calculation, launch SMFA, and then go straight to Item 6 in the main menu, which is 6) Restart the electronic structure calculations When you choose this item, SMFA responds with You can restart the SEQUENTIAL or PARALLEL quantum chemistry calculations in SMFA if you have suffered a hardware failure, or similar catastrophe. Before you restart, you can change the time, memory or disc space requests. For SEQUENTIAL calculations, you can change the time, memory or disc space requests, by editing the pbsfile directly, or via main menu item 4, and then come back here to perform the restart. For PARALLEL calculations, you can change the time, memory or disc space requests, BUT NOT THE NUMBER OF PROCESSORS, by editing the pbsfile directly, or via main menu item 4, and then come back here to perform the restart. You CANNOT restart the SEQUENTIAL or PARALLEL quantum chemistry calculations in SMFA if you have changed anything related to the molecular structure or bonding, 48 or the fragmentation parameters, or the electronic structure calculation. You CANNOT restart a GEOMETRY OPTIMISATION or SCAN using this facility You should store any optimised energies and geometries, and re-run the optimisation or scan using the last 'Newcoords.xyz' geometry as the starting geometry. To exit this menu item, hit RETURN To continue, enter RESTART If you do not want to proceed with a restart at this time, you simply hit the RETURN key. If you wish to initiate a restart then, you enter RESTART (actually not case sensitive). SMFA will respond with: If the calculation was running SEQUENTIALLY, you must continue that way. If the calculation was running in PARALLEL, you must continue that way. Enter S or P to continue as SEQUENTIAL or PARALLEL If you enter S or P, then SMFA will continue with the restart, otherwise the restart will abort. If you are running sequentially, SMFA will respond by first re-examining the input and printing comments to OUT_SMFA (the previous OUT_SMFA file is overwritten). You have to hit the RETURN key when this step is complete. SMFA then recalculates the fragmentation. You have to hit the RETURN key when this step is complete. SMFA then submits the ab initio calculations, and exits. When the calculations are complete, the output (OUT_SMFA) should be indistinguishable from that of an un-interrupted job. If you are running in parallel, then SMFA submits a job which simply restarts the ab initio calculations beginning with the jobs that were not complete and then proceeds to append the output to OUT_SMFA. 49 3. Output Guide The principal output file for SMFA is called OUT_SMFA. 3.1 Checking the input The first section of this file contains a summary of the input parameters entered by the user. This is followed by a description of any formally charged groups that SMFA has found within the input structure. If metals were discovered, that had not been specified in the input (section 2.4.5), then SMFA warns the user that these charges should be specified. The coordinates of all formally charged groups are written to OUT_SMFA. NOTE: The user should examine the geometries of any such charged groups. The coordinates printed in OUT_SMFA are in a format easily "pasted" into an appropriate graphics program like Jmol or MacMolPlt (or loaded into VMD). SMFA identifies formal charges on the basis of bonding and valency. If SMFA identifies unphysical charges, this is likely due to a defect in the input geometry. For example, if the structure has been obtained from a crystallography data base, then hydrogen atoms may be missing from the structure and spurious charges will then result. Other typographical errors may have resulted in unphysically short or long atom-atom distances which will result in mis-assignment of the bonding and hence mis-assignment of formal charges. If the structure is not "normal", in the sense that some atom-atom distances are neither typical for bonds nor for "non-bonded interactions" (for example the geometry signifies some intermediate point on a reaction path), then the charge of some atoms may need to be specified explicitly by the user (see Section 2.4.5). The user should check this part of the output at least once. 3.2 Fragmentation output The coordinates for the fragments produced at the requested value of Level are written to the file seefrags. This file is in a standard xyz format which can be read by many molecular graphics programs (eg MacMolPlt or VMD). A brief summary of the number of fragments, and the number of electrons in the largest fragment is appended to OUT_SMFA. An ab initio input file for the largest fragment is created with the file name "LARGE.com" or LARGE.inp or LARGE.nw, whichever suffix is usual for the quantum chemistry package. This file can be used to run trials to estimate the cpu time, memory and disk space requirements. Note, if you are running a geometry optimization or scan, then the type of job specified in LARGE.com will be an optimization. You should change this to a gradient calculation for the purpose of a trial calculation. 50 3.3 Output Energies When the ab initio calculations have been completed, SMFA collects the multitude of fragment energies, the results of long-range interactions (calculated using perturbation theory) and prints the resultant total electronic energy to OUT_SMFA. Error messages The total electronic energy is evaluated from many individual fragment calculations. One or more of these component electronic structure calculations may have failed; either due to a hardware or system error, or for example, inadequate memory or disk space, or failure of the SCF iteration to converge. If SMFA thinks a calculation has failed to complete satisfactorily, it prints a message to OUT_SMFA like calculation may have failed for charge.49.grp.com or calculation may have failed for FRAG34.com or some other ".com" filename, such as ab.19.85.com, nb.140.0.com, or nb.34.0-polar.com, where the numerals in the filename can take many different values. If such a message appears in OUT_SMFA, the user should inspect each of the corresponding ".log" files (eg FRAG34.log) to check if the electronic structure calculation did indeed fail. If an individual calculation did FAIL, then the value of the energy reported in OUT_SMFA is INCORRECT. In this case, you may need to repeat the entire calculation, if the error was caused by an error in the input. However, if the error was caused by a hardware or system error, or if an increase in the allowed memory or disk space in the pbsfile file (see Section 2.7) would solve the problem, then it may be possible to repeat only those component calculations that failed. See Section 2.9 on restarting calculations. 3.4 Output Gradients If you have chosen to evaluate the forces or gradients, the gradients are written to a file called combinedderivs, under the heading "First derivatives". The order of the derivatives is x,y,z for atom 1, then x,y,z for atom 2, and so on. This file also contains the atomic masses, elemental symbols, energy, coordinates, and energy second derivatives (if they were requested). A brief note is appended to OUT_SMFA. NOTE: Any subsequent gradient, frequency or optimisation process will overwrite combinedderivs. 51 3.5 Output Frequencies If you have chosen to calculate the frequencies, these frequencies are written to a file called FREQUENCIES, along with the Infrared intensity for each mode. The Cartesian displacements of the atoms for each normal mode are written to a file called NORMAL_MODES. A simulated Infrared spectrum is written to a file called SPECTRUM. The Cartesian hessian is written to a file called combinedderivs, under the heading "Upper triangle of the second derivatives". The rows and columns of the hessian are in the order: xyz for atom 1, then xyz for atom 2, and so on. The simulated spectrum assumes Lorentzian peaks with a FWHM of 5 cm-1. You can produce another simulated spectrum with a different FWHM using the "Frequencies" utility program (see Section 5). Using this utility, you can also evaluate frequencies, intensities and spectra for isotopic substitutions. NOTE: The zero-point energy is written at the bottom of the FREQUENCIES file and is also written to the OUT_SMFA file. 3.6 Optimised Structures A brief description of the methodology used to find minimum energy and transition state (saddle point) geometries is provided in Appendix C. The criteria for convergence are also described therein. Converged structures When a structure has converged to a minimum or saddle point, the geometry is written to a file called CONVERGEDCOORDS. This file is in standard xyz format (as are input coordinate files). A summary of gradients and energies at each step in the optimisation process is contained in a file called optout. A brief statement noting convergence is appended to the OUT_SMFA file. NOTE: At present, frequencies are not automatically calculated for the converged structure. Hence, if you want these frequencies, you must carry out a separate calculation for frequencies with the converged structure as the input geometry. Unconverged structures If the structure has not converged by the maximum allowed number of steps, the current (final) structure is written to a file called Newcoords.xyz. Moreover, the sequence of structures at each step in 52 the optimisation process are written to files called Newcoords.xyz.step where "step" is an integer denoting the step number. The optimisation process can be easily restarted by putting any of these "Newcoord.xyz" files as the initial geometry and running SMFA again. NOTE: All files with names beginning with "Newco" are removed at the beginning of the optimisation process, so you must rename any such files you wish to retain, including that nominated as the new initial structure. NOTE: As described in Appendix C, the hessian, and energy and gradient, are calculated at the initial geometry at the Hartree Fock level of theory, while at subsequent steps the energy and gradient are evaluated at the chosen level of theory. Hence, you should not be perturbed by the fact that the energy (reported in optout and OUT_SMFA) changes abruptly between the initial step (step 0) and the first step. 3.7 Scans A scan consists of a sequence of geometry optimisations at discrete points along a path in the molecular coordinate space. The output consists of (1) A summary of the input to SMFA is included in OUT_SMFA, including a description of the sequence of constraints which define the path. (2) If the geometry optimisation completes within the input maximum number of steps, the geometry is saved in a file denoted SCANconk, where k denotes the kth point along the path. If the convergence criteria have not been met within the input maximum number of steps, the geometry is saved in a file denoted SCANunconk. (3) Values of the molecular energy and constraint conditions during geometry optimisations at each point on the scanned path are contained in OUT_SMFA. (4) A summary of the energy at the final geometry at each point along the path is given a the end of the OUT_SMFA file. (5) A summary of gradients, displacements and energies at each step in the optimisation process at point k is contained in a file called optout.k. The sequence of intermediate geometries during each optimisation (in files denoted Newcoords.xyz.step are overwritten by the corresponding files at the next point along the path. 53 4. Suggested Procedures It is important to remember that SMFA is not exact for the calculation of molecular energies or other properties. However, the systematic character of the method allows the user obtain a good indication of the accuracy obtained, relative to a calculation on the whole molecule. Feasible, reliable calculations Of course, an MP2/cc-pVDZ or even a CCSD(T)/pV5Z calculation of the molecular energy is not exact. Indeed, we care little about the exact total energy of some molecular structure, we are almost always interested in the relative energies of two or more structures (eg reactants and products of some reaction). One tries to judge convergence towards the exact relative energy (or other property) from a sequence of calculations using progressively higher levels of theory and progressively larger basis sets, in the usual practice of quantum chemistry. Established practice would normally apply this systematic approach to a some moderately small system containing the chemistry of interest; a system small enough for larger basis sets and higher levels of theory to be feasible. This allows us to "benchmark" a more modest method that is feasible for larger systems, but that gives reliable results compared to the most reliable methods for the small system. This established procedure is still applicable using SMFA. Generally, the SMFA estimates of relative energies (and other properties) can be expected to be reliable for Level = 3 or 4 (more about this later). For Level = 3, the largest fragments (the largest "molecules" for which ab initio calculations are performed) can be expected to contain between 4 and 6 chemical functional groups. For Level = 4, the largest fragments can be expected to contain between 5 and 7 chemical functional groups. The size of the largest fragment establishes what level of ab initio theory and basis set is feasible using SMFA. You can find information about the largest fragment in the OUT_SMFA file after you have fragmented the molecule (main menu item 3), that is before any ab initio calculations have been performed. A sample quantum chemistry input file is provided for this fragment in a file denoted LARGE.com (or LARGE.inp or LARGE.nw). Hence, before any ab initio calculations have been carried out, you can see what the largest calculation will be and therefore you could estimate what level of ab initio theory and size of basis set will be feasible for your application of SMFA, given the available computational resources. You should consider what is feasible for Level = 3 and Level = 4. Hopefully (but not certainly) you will not need to use much higher values of Level. You use different values of Level to establish convergence of energies and properties with respect to the fragmentation approximation. 54 Extensive testing indicates (thus far) that the reliability of SMFA at a given value of Level does not depend significantly on the level of ab initio quantum chemistry or the size of the basis set. This means that if SMFA estimates the relative energy of two molecular structures to within a few kJ mol-1 of the calculated whole molecule value at (say) HF/6-31G, then one can expect about the same accuracy at CCSD(T)/aug-cc-pV5Z. Hence, you can test convergence with respect to Level, by running SMFA with (say) HF/6-31G for Level = 2, 3, 4, and so on. You should see the reported values of the relative electronic energies converging in the series of results (albeit not monotonically). Once you have established the smallest value of Level that gives results that are reliable to within some desired tolerance, then calculations can be carried out at this value of Level with larger basis sets and higher levels of ab initio theory, in the usual way. Hence, a sensible approach to the study of a large molecule would be: (i) Choose a low level of theory and small basis set, say HF/6-31G. (ii) Set the cutoff value for non-bonded interactions, denoted dtol, to the default value of 1.1 (or 0.0 for Level 2, see Section 2.4.1 and Appendix B) (iii) Calculate the energy for Level = 2, 3 and 4 to see if the energy has converged (try Level = 5 to be certain). (iv) Repeat step (iii) with larger values of dtol, say as large as 1.5. Record the computation times to understand the impact of increasing the values of Level and dtol. (v) Determine from these calculations what is a reliable and computationally efficient value of Level and dtol. (vi) You can now calculate the energy with these values of Level and dtol, using more reliable combinations of quantum chemistry method and basis set. Using the largest fragment as a guide, you can determine what quantum chemistry methods are feasible with the available resources. (vii) The approach to optimization of the geometry or searching for a saddle point is the same as for small molecules. It is more efficient to determine the structures with modest basis sets and levels of theory first, then to refine the structure with more reliable methods. (viii) Having found a stationary point, evaluate the vibrational frequencies, infrared spectrum, and any other properties of interest (see Sections 5 and 6). It should not be necessary to repeat steps (i) to (v) at another structure of the same molecule. However, one might want to confirm the convergence (with respect to Level and dtol) of relative 55 energies, by increasing the value of Level by 1, compared to the "reliable" value determined above. Fairly extensive experience suggests that the fragmentation approximation may be converged by Level = 3 (not always), but that calculations at Level = 4 are necessary to confirm this. Increasing the value of dtol from 1.1 to 1.5 generally does improve the accuracy of calculated energy gradients 14 (and frequencies), and final optimization of minimum energy or saddle point structures (and frequencies) might be carried out with dtol = 2. SMFA normally uses the default values for convergence of the SCF calculation (for the particular quantum chemistry package), but employs a tighter convergence criterion if gradients or frequencies are requested. There have been a number of published reports of the accuracy achieved by SMFA 9,10 14 for the calculation of energies for moderate sized molecules. Note, once again, SMFA is not exact (but impressively accurate). In applications of quantum chemistry to large molecules, one should ask oneself: what is a sensible level of accuracy to aim for? 17 Under normal circumstances, the molecule of interest will be in (or near) thermal equilibrium, with an average total energy (including vibration and rotation) far above the energy of the equilibrium geometry. An attempt to determine the equilibrium structure and energy with higher and higher accuracy is likely to make less and less sense. Saddle points and Scans Both the optimisation and saddle point searches in SMFA [options Optimise (3) and Find TS (4) in section 2.4.3] are carried out using a local quadratic approximation to the molecular potential energy surface. In the case of optimisation, this will lead to an energy minimum "close" to the initial geometry. It will not necessarily lead to the global energy minimum. In the case of saddle point searches, common experience suggests that locating the saddle point that defines the transition state of interest is difficult unless the initial geometry is quite close by. In large molecules, it may be very difficult to "guess" an appropriate initial geometry. The scan facility [Option Scan (5) in section 2.4.3] provides a means to explore the molecular potential energy surface and determine a suitable initial guess for a saddle point structure. For example, if one knows that a bond that breaks in the transition from reactants to products, then one can run a scan with increasing values of this bond. One might find an energy maximum along this scanned path, and the geometry at that maximum could be a suitable initial guess at the saddle point geometry. NOTE: In such a scan, the "breaking bond" will eventually exceed the length that SMFA accepts for a bond to exist. At this point the fragmentation of the molecule will be different from that in the previous scan step. Hence, the calculated energy will change "discontinuously". To 56 prevent this, you can use the option 4) Specify hydrogen bonding and any unusual bonding in the input control submenu to specify that the "breaking bond" should always be taken as a bond, regardless of whether SMFA would normally consider this to be a bond. 57 5. Utilities Guide In the main menu, choosing 6) Utilities" brings up another submenu submenu: Utility Programs -> 1) 2) 3) 4) 5) 6) 7) 8) 9) Frequencies with isotopic substitution Isodesmic, homodesmotic and analogous reactions Combining isodesmic, homodesmotic etc reactions Electrostatic potential on the solvent-accessible-surface Dipole Polarizability Dipole Hyperpolarizability Internal Coordinates Add H atoms Exit 5.1 Frequencies with Isotopic substitutions The purpose of this utility is to allow the user to recalculate the vibrational frequencies with isotopic substitutions for some atoms. The utility of this feature is that Infrared spectra of large molecules are rather diffuse and unresolved into individual peaks, due to the large number of vibrational transitions of similar frequencies. However, if the spectrum of an isotopically substituted molecule is also available, then the difference between the substituted and unsubstituted spectra may provide useful information. NOTE: The user must first have run a frequency calculation for the molecule (that is job type 2 in the input submenu item 3). A normal frequency job produces the force constant matrix (hessian) in a file called combinedderivs and the electric dipole moment derivatives in a file named combDipDerivs. This utility needs these files, so the utility should be used in the subdirectory where the frequency job was executed. The atomic masses can be changed from their default values in two ways: (i) You can choose to change the masses of all atoms that have the same chemical element (as chosen by the user), or (ii) You can choose to change the masses of particular atoms (one or more atoms). The new mass must be given in atomic mass units (gm mol-1), so for example 13C would be given a mass of 13.00335. Choosing 1) Frequencies with isotopic substitution 58 then brings up the question: You can choose to change the isotope for every atom of a chosen element, or you can change the isotope for particular chosen atoms (by number). Do you choose a particular element (Y/N)? Enter Y or N If you enter "Y", this produces the prompt: Enter the elemental symbol and mass (in atomic mass units) Then for example, you enter C 13.00335 (or simply 13.) You must leave at least one space between the element and the mass. Note that SMFA requires correct elemental symbols, eg magnesium is Mg, not MG or mg. If you enter "N", this produces the prompt: For each atom to be substituted by an isotope, you must enter the atom number and the isotope mass (in atomic mass units). Enter the number and mass for each atom on one line, and finish with a RETURN (blank line): You enter, for example 27 13.0 54 13.0 123 2.0 which would mean that atoms 27 and 54 were replaced by 13C and atom 123 by deuterium. The atom number refers to the order in the input coordinate file. Either answer to the question above will then produce the following prompt: The IR spectrum will be simulated with Lorentzian peaks with a given full width at half maximum (FWHM). The default value of this width is 5 cm-1 Enter your chosen width or hit RETURN for the default: You enter the peak width you wish (in cm-1) eg 10.0 The value chosen should reflect the appropriate resolution for the experimental data with which you wish to compare the calculated spectrum. The program then calculates the data and reports that The isotope substituted frequencies are in file FREQUENCIES The Cartesian displacements for the normal modes are in file NORMAL_MODES The simulated spectrum is in file SPECTRUM 59 NOTE: The files FREQUENCIES, NORMAL_MODES and SPECTRUM will overwrite pre-existing files with these names, so you should have saved the "normal isotope" results under some other name. The vibrational frequencies are reported in cm-1. The intensities for each vibrational mode are calculated as the square magnitude of the derivative of the dipole moment vector with respect to the normal coordinate, multiplied by the transition frequency. The spectrum is simulated simply as a sum of Lorentzian peaks with a height proportional to the intensity (as discussed above) and the FWHM chosen above. The format of the SPECTRUM file is two columns, frequency (cm-1) and intensity. It is a simple matter to paste this data (or import the file) into simple plotting programs or spread sheets (eg Excel). Once you have established this data in the spreadsheet, one can simply calculate the "difference spectrum" and plot any of the original or difference spectra. NOTE: You can get spectra for the unsubstituted molecule with different FWHM, simply by entering "N" to the first question; hitting RETURN in response to "For each atom to be substituted..." and then entering the FWHM value of choice. NOTE: The vibrational frequencies and intensities are calculated in the harmonic approximation for ground to first vibrationally excited state transitions only. Therefore, both the reported frequencies and the simulated spectrum do not account for anharmonicity or for overtone and combination bands which will almost certainly be present in any experimental spectrum. 5.2 Isodesmic, homodesmotic and analogous reactions Background An isodesmic reaction 20 is one in which the number and type of chemical bonds is the same for both reactants and products. For example, reactants and products have the same number of C-C, C-H, C=O bonds, etc. Since, the heat of formation of molecules is mostly determined by the number and type of bonds, the heat of reaction is near zero for an isodesmic reaction. This fact allows one to estimate the heat of formation of one species in the reaction, if the heat of formation of all other species is known. A homodesmotic reaction 21 is similar, except that in addition to the same number and types of bonds in both reactants and products, the neighbouring substitutents of those bonds are the same. Given an even closer correspondence in bonding for reactants and products, a homodesmotic reaction has a heat of formation which is even more likely to be near zero. 60 As it turns out, SMFA naturally produces instances of such reactions. In SMFA, the fragmentation of a molecule M is written as N frag M → ∑ cn Fn (5.2.1) n=1 where the cn are integer coefficients and the Fn are fragment molecules. Some of the coefficients are negative. If we subtract these terms from both sides of (5.2.1), we can write M + ∑ (−cn )Fn → ∑ cn Fn cn <0 (5.2.2) cn >0 In SMFA, all the structures in the fragments are the same as their corresponding structure in the molecule. However, if we let all structures in (5.2.2) take their equilibrium distributions, then Eq. (5.5.2) represents a chemical reaction at equilibrium. If the fragmentation is performed at Level = 1, then the reaction is isodesmic; if Level = 2, then the reaction is homodesmotic. Some years ago 1, we labelled the corresponding reaction at Level = 3 "isoperiochic", but the label hasn't stuck. Implementation SMFA allows the user to derive an isodesmic, homodesmotic, or higher analogue reaction for their molecule of interest, even without doing any quantum chemistry calculations. However, SMFA relies on the "free" software Openbabel to convert the Cartesian coordinates of a molecule to its unique InchI (the IUPAC International Chemical Identifier). So you will need to have installed and loaded Openbabel on your computer (eg module load openbabel/2.3.2). You then simply launch SMFA and in the main menu: Select -> 1) Set up input (a) Enter the Level of Fragmentation (if you have not already done so); (b) Enter the filename containing the molecule coordinates (if you have not already done so); (c) Enter something under "Specify the electronic structure calculation", anything will do, as we are not performing any ab initio calculations (d) Under "Specify hydrogen bonding and any unusual bonding", answer "N" to all the questions except, answer "Y" to "Do you want to ignore hydrogen bonds (Y/N)?", and answer "Y" to " By default SMFA assumes that if two charged groups...." (e) Exit input set up. 61 Select 2) SMFA examines the input and prints comments or queries to OUT_SMFA Select 3) Fragment the molecule Select 5) Utilities Select 2) Isodesmic, homodesmotic and analogous reactions SMFA will complete the task and append the results to the OUT_SMFA file. The reactant and product molecules will be identified by their fragment number in the seefrags file, and by their InchI. The file seefrags contains the Cartesian coordinates of all fragments, while seefrags.inchi contains the corresponding InchI. In addition, SMFA produces 8 other files to describe the reaction. Let's suppose the file containing the molecule coordinates (see step 1b above) was called molecule.xyz. The 8 new files created by SMFA are: molecule.xyz.lhs.inchi molecule.xyz.lhs.coeff molecule.xyz.lhs.coords molecule.xyz.lhs.svg molecule.xyz.rhs.inchi molecule.xyz.rhs.coeff molecule.xyz.rhs.coords molecule.xyz.rhs.svg Here molecule.xyz.lhs.inchi contains a list of the Inchi for each compound on the left-hand-side of the reaction (5.5.2). The first Inchi is that of the original molecule. The file " molecule.xyz.lhs.coeff" contains the coefficients, -cn, of the compounds on the left-hand-side of the reaction, " molecule.xyz.lhs.coords" contains the Cartesian coordinates of these compounds, and " molecule.xyz.lhs.svg" is a file that you can open with a web browser like FireFox. The web browser will display a table of 2D chemical structure drawings for the compounds on the lhs of reaction (5.5.2). The corresponding files labelled with ".rhs" contain the same information for the right-hand-side of the reaction. 62 NOTE: The 2D chemical structure drawings (.svg files), created by OpenBabel and viewed in FireFox, should be treated with some caution. It appears that these drawings show double bonds in the wrong places and radical sites where no such radicals exist (due to misplacing the double bonds). Nonetheless, with the aid of some knowledge of chemistry, the 2D drawings provide a helpful illustration of the reactants and products. Additional manipulations of near-isoenergetic reactions can be carried out using the utility in the next section. IMPORTANT CAVEAT: The basis of isodesmic, homodesmotic, etc reactions ignores long range interactions between functional groups in a molecule (assuming that "bonded" and "near bonded" effects are the major contributors to heats of formation). If long range electrostatic interactions are large then this approach will not provide an accurate estimate of the heat of formation. 5.3 Combining isodesmic, homodesmotic etc reactions If you have created the files for more than one isodesmic, homodesmotic (etc) reaction, you can use this utility to 'subtract' one reaction from another. For example, you may have found an isodesmic or homodesmotic etc reaction for each of two isomers. Since both reactions are near isoenergetic, a new reaction that we form by subtracting one reaction from the other is also near isoenergetic. In this example, this new reaction could be used to estimate the energy difference between the two isomers. If the isomers have many similarities in structure, then 'subtracting' one reaction from the other will result in the cancelation of many components of the reactions. Thus a simpler reaction for estimating the energy difference between the two isomers will be obtained. Many other applications for constructing near isoenergetic reactions may come to mind. Choosing 3) Combining isodesmic, homodesmotic etc reactions will result in some comments and two prompts: The isodesmic (etc) utility created several files for the reactions with names like name1.lhs.inchi and name2.lhs.inchi where name1 and name2 were the names of the coordinate files for these molecules. All you have to do is enter these two file names, eg name1 and name2 Enter the first file name You enter the filename for molecular coordinates previously used in Section 5.2 (say name1) Enter the second file name 63 You enter another filename for molecular coordinates previously used in Section 5.2 (say name2) The files for the reaction formed by subtracting the reaction for name2 from the reaction for name1 are written to files with the prefix name1-name2. These files, like "name1-name2.lhs.inchi" have the same form as corresponding files created in Section 5.2. So, these output files can form the input for further manipulation of reactions via this utility. 5.4 Electrostatic potential on the solvent-accessible-surface SMFA will generate the data to allow you to visualise the electrostatic potential (ESP) on the solvent accessible surface surrounding the molecule. The electrostatic potential is generated from a charge, and dipole and quadrupole moments distributed onto every atom in the molecule. These charges and electrostatic moments are generated from the electron densities that were generated for Level = 1 fragments, using Stone's method applied to the output from GAUSSIAN or Q-Chem. Similarly, Stone's method is integrated into GAMESS, and the corresponding moments are used. For NWChem, only distributed charges on the atoms are available for use. See Appendix E for more details about the implementation of distributed multipole moments. If you have used SMFA to carry out a calculation of the energy, gradient or hessian (at any value of Level), then the necessary data is available to SMFA. The solvent accessible surface is defined as the surface that is separated from the nearest atom in the molecule by the Van der Waals radius of that atom plus a radius associated with the solvent molecule. This utility program provides two output files for graphics. The first file is called SOLACCSURFACE. This is an 'xyz' format file which can be loaded by programs like VMD or MacMolPlt to show you the solvent-accessible-surface surrounding the molecule. This graphic is useful in conjunction with the second file, called ESP.cube. This file has the format of a GAUSSIAN 'cubegen' file, and can be read by VMD (and other programs) to show the electrostatic potential on the solvent-accessible-surface. NOTE: To use this utility, the current directory must be that in which you have just carried out an energy, or gradient or hessian calculation for the molecule of interest. The electron density for the molecule which generates the ESP is a Level =1 approximation to the electron density at the level of ab initio theory and basis set which was employed. For Q-Chem and GAMESS, only the SCF density is used. If you are only interested in the ESP, you can save cputime by choosing Level = 1 and dtol = 0.0 in the SMFA input, and choosing only an energy calculation. 64 In the Utilities subdirectory, choosing 3) Electrostatic potential on the solvent-accessible-surface brings up three prompts: You must enter a value for this solvent radius (the default is 1.4 Angstrom) You enter a value (such as 1.4, which is commonly used for H2O). You must enter the desired density of points on the surface (per square Angstrom) The default value is 1.0 You enter a value such as 1.0. When you look at the graphics (see below), you might decide to try different values. You must enter the number of grid points for the 'cube' of ESP data, The recommended value is 100 (per axis direction) You enter a value such as 100. The program will produce a rectangular prism of data points, with 100 points on the longest axis, and proportionally less on shorter sides. This box will enclose the molecule by a clear margin on all sides. The resultant output will be some MB in size. The larger integer you select above, the longer will be the calculation time (expect minutes), the larger will be the ESP.cube file, and the better will be the graphical resolution. A brief note on the output is appended to OUT_SMFA. Output The file SOLACCSURFACE is a simple xyz format file which you can open in free-ware such as VMD, MacMolPlt, iMol or similar programs for representing molecules. If you choose a "ball and stick" representation, then this file shows the molecule surrounded by small off-white balls which lie on the solvent-accessible-surface for the radius of solvent molecule that you have chosen. The file ESP.cube can be opened by VMD. It will show a coloured surface with represents the electrostatic potential on the solvent-accessible-surface (blue for positive, red for negative). To see this in VMD, you should (i) Launch VMD. A graphics window will open and also a window labeled "VMD Main". (ii) Under the "File" menu in the "VMD Main" window, choose "New Molecule". A new window will appear called "Molecule File Browser". In this window, Click "Browse". Locate the "ESP.cube" file, and choose it. 65 (iii) In the "Molecule File Browser" window, under "Determine file type", choose "Gaussian Cube" (if VMD has not already done so). Then click "Load". A new line will appear in the "VMD Main" window, with the name of your file (eg ESP.cube). (iv) In the "VMD Main" window, choose "Representations" under the "Graphics" menu. A new window will appear, entitled "Graphical Representations". (v) In this "Graphical Representations" window: under "Drawing Method", choose "Solvent"; next to "Representation Method" choose "Mesh"; next to "Probe Radius" choose a value like 2.8. The coordinates in ESP.cube are in atomic units (Bohr). The ESP is also in atomic units. The "Graphical Representations" window should look like this: You can click "Apply". (vi) A coloured mesh representation of the ESP on the surface (with the solvent molecule or "probe" radius chosen) will appear in VMD's graphics window (blue for positive, red for negative). You can fiddle with the settings in VMD to find the image you like best. You can change the value of the "Probe Radius" to see what effect that has on the ESP. 66 The image that is produced from the SOLACCSURFACE file is useful for comparison with the ESP surface, as SOLACCSURFACE shows the molecule itself within the solvent-accessible-surface. 5.5 Dipole Polarizability SMFA estimates the electric dipole polarizability of the molecule using two methods, from data that may already have been evaluated by SMFA. Method 1 If the molecule is neutral, or if long-range dispersion has been accounted for in the calculation of the energy, then the polarizability of the individual chemical functional groups will have been evaluated by SMFA. For GAMESS and NWChem the polarizabilities are actually evaluated using DALTON. The total molecular polarizability is then simply estimated as the sum of the group polarizabilities. This is likely a significant overestimate of the actual value, as many "capping" hydrogen atoms have contributed to the sum (atoms that are not in the actual molecule). However, this method is included here, because it is available for all quantum chemistry programs interfaced with SMFA. When you choose this utility the results for Method 1 are appended to OUT_SMFA. Method 2 For GAUSSIAN, the dipole polarizability is automatically evaluated if the frequencies have been calculated. Alternatively, if the keyword POLAR (not case sensitive) has been included in Section 2.4.3.B (see Enter any other keywords (optional), or hit RETURN:) then the dipole polarizability is calculated. In these two cases, the molecular polarizability can be obtained as a sum over the fragment polarizabilities. This should be more accurate than Method 1, since the contributions of capping hydrogens approximately cancel in the sum over fragments, Fn: N frag If M → ∑ cn Fn , n=1 N frag then P ≈ ∑ cn P ( Fn ) , for any property, P, including the polarizability. n=1 The results are appended to OUT_SMFA. 67 For Q-Chem, you must explicitly request calculation of the dipole polarizability by including MOPROP 2 in the $rem input (see Section 2.4.3.D). Then the molecular polarizability can be obtained as a sum over the fragment polarizabilities. The results are appended to OUT_SMFA. 5.6 Dipole Hyperpolarizability If frequencies have been evaluated (before hyperpolarizabilities are requested here and without some other subsequent calculation, eg of the energy only), then the molecular hyperpolarizability can also be obtained as a sum over the fragment hyperpolarizabilities (as above) using GAUSSIAN. This property has not been programmed for Q-Chem as yet. The results are appended to OUT_SMFA. 5.7 Internal Coordinates This utility program enables the user to find the values of atom-atom distances, angles formed by three atoms and dihedral angles defined by four atoms (see Appendix C for the definition of dihedral angles). Atom-atom distances are reported in Angstrom, while bond angles and dihedral angles are reported in degrees. The geometry investigated is that contained in the coordinate file specified in Section 2.4.3. NOTE: You can use this utility to find the current value of internal coordinates, when using constrained optimization or the Scan facility. If you choose 7) Internal Coordinates SMFA prompts To To To To find a bond length, enter the 2 atom numbers find a bond angle, enter the 3 atom numbers find a dihedral angle, enter the 4 atom numbers exit, hit RETURN You enter a sequence of integers (atom numbers). For example, your input, and the consequent output, might look like: 13 12 68 13 12 1.230775 12 11 14 12 11 or 13 12 13 11 12 13 11 120.844332 or 13 14 -117.820407 In these examples, the angle is the angle that the vector 12 --> 13 makes with the vector 12 --> 11; the dihedral angle is the angle between the planes (13 12 11) and (12 11 14). 5.8 Add H atoms Sometimes coordinates which are reported from diffraction or NMR experiments have hydrogen atoms "missing". This may be due to the difficulty in determining the positions of H atoms in X-ray crystallography. Alternatively, an NMR based report may contain only a segment of the total structure and "terminal" atoms are reported with an incomplete valence - so it appears that a H atom should be appended to this terminus. The quantum chemistry calculations would be drastically perturbed if this "incomplete" structure were employed, so we naturally want to add the "missing" H atoms. If there are hundreds of missing H atoms, you probably need to find another structure, or locate a program that can fix it. If only a few H atoms are missing, then SMFA has a utility that can be used to add H atoms, one at a time. IMPORTANT NOTE: You may only become aware of the fact that there are missing H atoms when OUT_SMFA reports the presence of charged groups at the time that it checks the input. You should use a graphics program (eg iMol, MacMolPlt, IQMol, VMD) to look at all the charged groups that OUT_SMFA reports, to check for missing H atoms. For each of these charged groups, OUT_SMFA reports the atom number for the first atom in each group. This will allow you to locate the position of the charged group (with a missing H atom) in a graphic of the total structure. To insert a H atom, you will need to look at this structure and estimate where the additional H atom should go (approximately). If you choose 8) Add H atoms 69 SMFA responds with To get coordinates for a H atom to replace one that is missing, you will need to enter the atom numbers for three atoms in the structure: The H atom will be bonded by 1 Angstrom to the second atom; the first atom is taken to lie on an imaginary x axis from the first atom; the third atom is taken to lie in an imaginary xy plane, in the +y direction from the second atom. You will then enter two spherical polar angles, theta and psi (your best guess), and the program will output the coordinates, which you can paste into a modified coordinate file. Enter the 3 atom numbers, or enter RETURN to finish You enter the three appropriate atom numbers. As you can see, you are being asked to imagine an xyz axis system with the second atom at the origin. You will need to estimate (guess) the spherical polar angles that would define the position of the new H atom in this imaginary axis system. You enter three atom numbers, eg 23 105 44 SMFA then responds with Enter the two angles in degrees you enter two angles, eg -25.0 -80.0 Then SMFA responds with (eg) H 1.395596 -0.671719 -0.109015 Enter the 3 atom numbers, or enter RETURN to finish You can enter RETURN to finish, or enter atom numbers for a second additional H atom, and so on. Before you enter RETURN, you should copy the line above, which contains the H label and coordinates, so you can "paste" this into the coordinate file. NOTE: When you have finished adding additional H atoms, you should repeat main menu item 2 to check that the geometry is now appropriate. 70 6. Write your own property If you can write Unix, Perl or other scripts (SMFA is a mixture of Perl, Unix and Fortran), you can use SMFA to evaluate any property that GAMESS(US), GAUSSIAN, NWChem or Q-Chem can produce. A little reminder of how SMFA evaluates the molecular energy is useful for explaining what you have to do. In SMFA, the molecular energy E is given by E= N frag ∑ c E (F ) n (A) n n=1 ( ) ( ) +∑ ∑ cij ⎡⎣ E Fi (1) Fj(1) − E ( Fi (1) ) − E Fj(1) ⎤⎦ i j (B) (5.5.1) + E pert where the cn are integer coefficients and the Fn are fragment molecules, evaluated for the value of Level requested. The cij are also integer coefficients and Fi (1) are fragments evaluated for Level = 1. Term (B) accounts for non-bonded, but relatively close, interactions between functional groups, and Epert is a small correction evaluated using perturbation theory (see Appendix E). Any molecular property can be thought of as a derivative of the total energy in the presence of some applied field. So, any property can be written (to a good approximation) as P= N frag ∑ c P(F ) n n=1 (A) n ( ) ( ) +∑ ∑ cij ⎡⎣ P Fi (1) Fj(1) − P ( Fi (1) ) − P Fj(1) ⎤⎦ i j (5.5.2) (B) The first term (A) in (5.5.2) is overwhelmingly the largest contribution to the property value. Using term (A) alone was the method adopted in (5.5) and (5.6) for the polarizabilities. Details for implementing part (B) are contained in Appendix D. To implement part (A) of (5.5.2) for any property, you simply have to (i) Include the appropriate "keyword" or other instruction in the input for the ab inito calculation, requesting the property you want (see Section 2.4.3 above). (ii) Follow the SMFA process down to and including running the ab initio calculations (Section 2.7) (iii) The output of these calculations for fragments F1, F2, and so on, are written to files called FRAG1.log, FRAG2.log, and so on. You must write a Unix script, Perl script, or similar, to extract the value of the property you want from these "log" files. (iv) The coefficients in (5.5.2 A), c1, c2, and so on, are contained in the file signs.out. Your script 71 simply has to take the property value from each FRAGn.log file, multiply it by the corresponding cn and add up the result. If you care to write your script in Perl, it can be incorporated into SMFA for everone to use! Note also, that the derivatives of properties with respect to the atomic coordinates can be estimated using SMFA. From (5.5.2), ∂P = ∂xα N frag ∑c n n=1 ∂P ( Fn ) ∂xα ( (A) ) ( ) ( ) ⎡ ∂P Fi (1) Fj(1) ∂P ( Fi (1) ) ∂P Fj(1) ⎤ +∑ ∑ cij ⎢ − − P Fj(1) ⎥ ∂xα ∂xα ∂xα i j ⎢⎣ ⎥⎦ If derivative properties, (5.5.3) (B) ∂P ( Fn ) , are evaluated by the quantum chemistry program for each fragment, ∂xα then the corresponding quantities for the whole molecule can be estimated. This requires knowing which atoms in the whole molecule are present in each fragment. This information is contained in the file frags.out. The details needed to evaluate (5.5.3) are also contained in Appendix D. Properties that involve higher order coordinate derivatives can be evaluated in the same way. The NMR chemical shift for all the atoms in a molecule is an example of a property that can be evaluated from the fragment calculations. 15 72 7. Test Cases/Examples 7.1 Relative energies of two protein conformers The solution structure of a small peptide was investigated by NMR and reported in Ref. 22. A total of 20 structures were deposited in the Protein Data Base (RCSB PBD) with the identifier (PDB ID) 1xv4. The 1st and 20th structures have been used in this test case, with Cartesian coordinates in files 1xv4_1.xyz and 1xv4_2.xyz, respectively, included in the directory SMFA/doc/testcases/1xv4. This small peptide contains 224 atoms. The energy of these two structures, and the corresponding energy difference, has been determined at HF/6-31G(d,p) using GAUSSIAN09 with Level 2 3 4 dtol 0.0 1.1 1.1 The results are shown in Table 7.1. Table 7.1 Energies and relative energies of the two 1xv4 conformers, at HF/6-31G(d,p). Level Energy (au) Energy (au) Relative Energy 1xv4_1 1xv4_2 (kJ mol-1) 2 -5143.290365 -5143.227675 -164.59 3 -5143.292222 -5143.233856 -153.24 4 -5143.294725 -5143.236096 -153.93 The file OUT_SMFA_1xv4_1_Level3 is included in the test case directory as an example of the OUT_SMFA file. From this file, you can see that the amide bond has NOT been taken as a multiple bond, otherwise all parameters are standard. No dispersion is included for this Hartree-Fock calculation. Note that the conformer energy difference is converged by Level = 3, although each conformer total energy varies between Level = 3 and Level = 4. From the OUT_SMFA file, one sees that 14 cpus were recommended for parallel execution. Using 14 cpus on an Intel Xeon E5-2670, this calculation required 2002 secs total cpu time, and 234 secs walltime. 73 7.2 Energy gradient The GAMESS-US program was used to calculate the B3LYP/cc-pVDZ gradient at a geometry of c-cyclotridecene. The molecular coordinates (in file R389225-3d.xyz), the OUT_SMFA file and the combinedderivs file are contained in the SMFA/doc/testcases/c-cyclotridecene subdirectory. The gradients (in Hartree/Bohr) are contained in the combinedderivs file. The cc-pVDZ basis set was implemented using the procedure outlined in Section 2.4.3A, following the phrase "For more complicated basis sets". The cc-pVDZ basi set was also used to evaluate (using DALTON) the polarizabilities which are needed to evaluate the induction energy (negligible in this case). 7.3 Geometry optimization A structure for c-cyclotridecene was optimized at HF/cc-pVDZ with Level = 3 and dtol = 1.1. The initial geometry (file R389225-3d.xyz) together with the OUT_SMFA and optout files are contained in the SMFA/doc/testcases/opt_c-cyclotridecene/HF directory, along with the converged geometry in file CONVERGEDCOORDS. The CONVERGEDCOORDS file was renamed optL3dtol1.1c-cyclotridecene.xyz and used as the initial geometry for optimization at MP2/cc-pVDZ. This file, along with the OUT_SMFA and optout files and the converged geometry in file CONVERGEDCOORDS, are contained in the SMFA/doc/testcases/opt_c-cyclotridecene/MP2 directory. All MP2 calculations were carried out using the Q-Chem program. 16 processors were employed in a parallel calculation (although the OUT_SMFA files shows that 23 processors were recommended). For comparison, c-cyclotridecene was also optimized at HF/cc-pVDZ with Level = 3 and dtol = 1.1, using the GAMESS(US) program. The corresponding files are contained in the SMFA/doc/testcases/opt_c-cyclotridecene/GAM directory. 7.4 Frequencies The CONVERGEDCOORDS file for the MP2/cc-pVDZ (Level = 3, dtol = 1.1) optimization above were copied to a file entitled optMP2L3dtol1.1c-cyclotridecene.xyz. The MP2/cc-pVDZ frequencies for c-cyclotridecene were then evaluated using the Q-Chem program. 16 processors were employed in a parallel calculation (although the OUT_SMFA files shows that 23 processors were recommended). The OUT_SMFA, FREQUENCIES, NORMAL_MODES, and SPECTRUM files are contained in SMFAPAC/doc/testcases/Freq/. 74 SMFA creates the combinedderivs file, which contained the hessian (inter alia), and the combDipDerivs file, which contains the Cartesian derivatives of the molecular dipole moment. Both these files are needed to evaluate the vibrational spectrum. These two additional files are also contained in SMFA/doc/testcases/Freq/. A deuterated spectrum To test the Utility program 5.1 "Frequencies with isotopic substitutions", this utility was used to substitute a mass of 2.0 for the two hydrogen atoms (atom numbers 36 and 37) which are attached to the "ene" group in c-cyclotridecene. Having chosen the Utilities in the main menu and "1) Frequencies with isotopic substitution" in the submenu, the following prompts from SMFA and responses were performed: This utility evaluates the vibrational frequencies, intensities, and spectrum for isotopically substituted molecules The force constant matrix (contained in combinedderivs) and the dipole derivatives that were previously evaluated for this molecule in this subdirectory are used to evaluate the frequencies. You can choose to change the isotope for every atom of a chosen element, or you can change the isotope for particular chosen atoms (by number). Do you choose a particular element (Y/N)? n For each atom to be substituted by an isotope, you must enter the atom number and the isotope mass (in atomic mass units) Enter the number and mass for each atom on one line, and finish with a RETURN (blank line): 36 2.0 37 2.0 The IR spectrum will be simulated with Lorentzian peaks with a given full width at half maximum (FWHM). The default value of this width is 5 cm-1 Enter your chosen width or hit RETURN for the default: 5.0 The isotope substituted frequencies are in file FREQUENCIES The Cartesian displacements for the normal modes are in file NORMAL_MODES The simulated spectrum is in file SPECTRUM The output files FREQUENCIES, NORMAL_MODES and SPECTRUM are contained in the SMFA/doc/testcases/Freq/deuterated subdirectory, renamed with a "_d2" suffix. 75 7.5 Scan To illustrate some features of the scan procedure, we consider the reaction of a hydrogen atom abstracting a hydrogen atom from a CH2 group in decane: H + CH3(CH2)8CH3 --> H2 + CH3(CH2)4CH(CH2)3CH3 We begin with an initial structure in which a H atom approaches decane along the direction of a CH bond, as shown below. The coordinates (file H+decane_react.xyz) are contained in SMFA/doc/testcases/scan/reactants/. In this initial geometry, the incoming H atom is 1.644 Å away from the closest H atom in decane. The scan will optimize the structure for a succession of decreasing H....H distances, in order to obtain an energy profile as a function of the H...H distance. Reactant optimization Before carrying out the scan, we have optimized the reactant geometry subject to the constraint that the H...H distance is 1.644 Å. The initial geometry, optimized coordinates, OUT_SMFA and optout files are contained in SMFA/doc/testcases/scan/reactants/. The two hydrogens involved in the constraint are atom numbers 15 and 33 in the coordinate file, and the OUT_SMFA file shows this constraint. The optimization was carried out using the NWChem program at HF/cc-pVDZ with Level = 3, and dtol = 1.1. The input shown in OUT_SMFA has another feature to note. When this reaction proceeds, the product will be a radical, containing a -HC•- group. In considering the bonding in such molecules, SMFA cannot, a priori, know whether a carbon with unsatisfied valence is a radical or an anion (or even a cation in some circumstances). Hence, the user has to inform SMFA (in the input) that this carbon has a charge of 0. Hence, in the input submenu, under 5) Specify charges for metals & other atoms (optional) the user must specify that atom number 14 has charge 0. 76 Under the constraint that the H..H distance is 1.644 Å, it is unlikely that optimization will lead to breaking the C-H bond, and the formation of a carbon radical. Hence, it is probably not necessary to include this charge specification here. However, during the following scan, where the H..H distance is reduced, this charge specification will be essential. The following files have been included in the SMFA/doc/testcases/scan/reactants/ directory: OUT_SMFA, optout, CONVERGEDCOORDS, and the initial geometry file H+decane_react.xyz. Reaction path scan Beginning with the converged geometry, H+decane_react.xyz, a scan of the minimum energy as a function of the H...H distance was carried out using the NWChem program at HF/cc-pVDZ with Level = 3, and dtol = 1.1. The following files are included in SMFA/doc/testcases/scan/reaction_scan: H+decane_react_opt.xyz, OUT_SMFA, SCANcon1, SCANcon2, SCANcon3, SCANcon4, SCANcon5, SCANcon6, SCANcon7, SCANuncon8, SCANcon9, SCANcon10, and optout.1,...., optout.10. The OUT_SMFA file shows that the scan includes 10 points, on a path which is defined by the atom-atom distance between atoms 15 and 33 having an initial value of 1.664 Å, reducing by 0.1 Å at each step. The optimized geometry at each step is contained in the file labeled SCANcon1, 2 etc. Note that although 100 steps were allowed for each optimization, the geometry did not reach convergence for step 8 on the path. At the tail of OUT_SMFA, you can see that the energy profile on the path reached a maximum at step 7. Hence, we can presume that the geometry in SCANcon7 is in the vicinity of the transition state (saddle point) for the hydrogen abstraction reaction (at HF/cc-pVDZ). 7.6 TS search Starting from the SCANcon7 geometry above, optimization to a saddle point, lead (in 32 steps) instead to a geometry which was well towards the reactants, H + decane. Starting from the SCANuncon8 geometry above, optimization to a saddle point, lead instead to a geometry well towards the products, H2 + a decane radical. Hence, the scan process was carried out again with the SCANcon7 geometry as a starting point, but only reducing the H..H distance by 0.025 at each step. This located a better estimate of the maximum energy on the path, at a geometry labeled step4.xyz (where the H..H distance is 0.969) in subdirectory SMFA/doc/testcases/TS/. With step4.xyz as the input coordinate file, optimization to a TS converged. This geometry, CONVERGEDCOORDS, the OUT_SMFA and optout files are also contained in SMFA/doc/testcases/TS/. 77 Both the scan and TS optimization calculations were carried out using GAUSSIAN at HF/ccpVDZ with Level = 3, and dtol = 1.1. 7.7 Isodesmic reactions Two molecules, denoted MOGQOO and TAXYIA in the Cambridge Structure Database, are isomers. The Cartesian coordinates for these are in the SMFA/doc/testcases directory in subdirectories MOGQOO (the file is MOGQOO.xyz) and TAXYIA (the file is TAXYIA.xyz). The procedure of Section 5.2 in the User's Manual was followed, with Level = 1, using the Cartesian coordinate file MOGQOO.xyz. This gives an isodesmic reaction involving MOGQOO. The output files created by this utility are also contained in the MOGQOO subdirectory. This includes the OUT_SMFA file. The same procedure was carried out using TAXYIA.xyz, to give an isodesmic reaction involving this molecule. The output files created by this utility are also contained in the TAXYIA subdirectory. This includes the OUT_SMFA file. 7.8 Combining isodesmic (etc) reactions The utility of Section 5.3 (Combining isodesmic, homodesmotic etc reactions) was then carried out using MOGQOO.xyz for the first file name requested, and TAXYIA.xyz for the second file name. The output files from this procedure are contained in the subdirectory MOGQOO_minus_TAXYIA. 7.9 Electrostatic potential on the solvent accessible surface. A small protein (PDB ID 5tlr), of 557 atoms, has been chosen to illustrate this utility. Twenty structures for this protein were reported ["Spider peptide toxin HwTx-IV engineered to bind to lipid membranes has an increased inhibitory potency at human voltage-gated sodium channel hNaV1.7", Agwa, A.J., Lawrence, N., Deplazes, E., Cheneval, O., Chen, R.M., Craik, D.J., Schroeder, C.I., Henriques, S.T., (2017) Biochim. Biophys. Acta 1859: 835-844], and we have chosen the first of these, denoted 5tlr_1.xyz in SMFA/doc/testcases/solventsurface/. Following the instructions in Section 5.4, we have carried out a HF/6-31G(d,p) energy calculation for this structure with Level = 1 and dtol = 0. Once the energy calculation was completed, we continued to follow the instructions in Section 5.4, and obtained the files OUT_SMFA, SOLACCSURFACE and ESP.cube, which are also contained in SMFA/doc/testcases/solventsurface/. 78 7.10 Dipole Polarizability In Section 7.3, the structure of c-cyclotridecene was optimized at MP2/cc-pVDZ. This structure, in the file optMP2L3dtol1.1c-cyclotridecene.xyz, has been included in the directory SMFA/doc/testcases/polarizability/. Using GAUSSIAN with the keyword POLAR, the energy of this structure was evaluated at HF/cc-pVDZ with Level = 3 and dtol = 1.1. Utility 5.6 was then used to evaluate the dipole polarizability and the resultant OUT_SMFA file was included in SMFA/doc/testcases/polarizability/ as OUT_SMFA_GAU_HF. Using Q-Chem, with the $rem instruction MOPROP 2, the energy of this structure was also evaluated at HF/cc-pVDZ with Level = 3 and dtol = 1.1. Utility 5.6 was then used to evaluate the dipole polarizability and the resultant OUT_SMFA file was included in SMFA/doc/ Finally, both the GAUSSIAN and Q-Chem calculations were repeated for MP2/cc-pVDZ, and the corresponding files, OUT_SMFA_GAU_MP2 and OUT_SMFA_QCH_MP2 are included in testcases/polarizability/. 7.11 Dipole Hyperpolarizability We again use the structure of c-cyclotridecene which was optimized at MP2/cc-pVDZ in Section 7.3 (file name optMP2L3dtol1.1c-cyclotridecene.xyz). For GAUSSIAN, a frequency calculation was carried out using HF/cc-pVDZ with Level = 3 and dtol = 1.1. Using option 6) Dipole Hyperpolarizability in the Utilities submenu, results in the hyperpolrizability tensor appended to the OUT_SMFA file. This file is labeled OUT_SMFA_GAU_HF in the SMFA/doc/testcases/hyperpolarizability/ directory. 79 7.12 Internal coordinates We again use the structure of c-cyclotridecene which was optimized at MP2/cc-pVDZ in Section 7.3 (file name optMP2L3dtol1.1c-cyclotridecene.xyz), as an example. Using (for example) MacMolPlt, the structure was represented graphically as Then, with optMP2L3dtol1.1c-cyclotridecene.xyz chosen as the coordinate file, choosing 7) Internal Coordinates To To To To in the Utilities submenu, SMFA responds with find a bond length, enter the 2 atom numbers find a bond angle, enter the 3 atom numbers find a dihedral angle, enter the 4 atom numbers exit, hit RETURN In order to measure the structure around the double bond, we might enter the following atom numbers and receive the responses shown 12 13 36 12 36 12 36 12 13 36 13 37 12 13 1.354582 12 13 13 117.178690 37 -0.829765 For example, we could look at a distance across the ring: 23 35 23 35 2.294300 All distances are in Å and angles are in degrees. 80 7.13 Adding H atoms As an example, a molecule was downloaded from the Protein Database [PBS ID 2mvh] 23. The first of 20 structures was used in this example, and is contained in the file 2mvh_1_original.xyz in the directory SMFA/doc/testcases/addHatoms. This structure was input into SMFA with some simple additional inputs (value of Level and ab initio method, etc) to allow SMFA to examine the input. The resultant OUT_SMFA file is also contained in SMFA/doc/testcases/addHatoms. The OUT_SMFA file lists charged groups that it finds in the structure. For example, the first charged group in the list looks like which is clearly a protonated amine group. However, the second last charged group looks like in which a carbon atom is clearly missing a bond, for whatever reason. We can add a H atom to restore full valence to this carbon atom as follows. First we load the coordinate file, 2mvh_1_original.xyz, into a graphics program (here MacMolPlt), so we can see the structure. From OUT_SMFA, we see that the carbon atom numbered 1 above is atom 436 in the original structure. So, secondly, we expand our view of the structure to locate and see the atoms near number 436. It might look something like 81 Now, we can choose 8) Add H atoms in the Utilities submenu, and add a hydrogen atom to atom 437. SMFA responds 82 To get coordinates for a H atom to replace one that is missing, you will need to enter the atom numbers for three atoms in the structure: The H atom will be bonded by 1 Angstrom to the second atom; the first atom is taken to lie on an imaginary +x axis from the second atom; the third atom is taken to lie in an imaginary xy plane, in the +y direction from the second atom. You will then enter two spherical polar angles, theta and psi (your best guess), and the program will output the coordinates, which you can paste into a modified coordinate file. Enter the 3 atom numbers, or enter RETURN to finish We could enter (note that 437 is the second atom number) 436 437 438 SMFA responds Enter the two angles in degrees. We entered (an easy guess for this sp2 carbon) 90.0 -120.0 and SMFA responded H -26.566643 -2.608096 -18.328955 We add these atomic coordinates to the original structure to get a new coordinate file, labeled 2mvh_1_corrected.xyz (also in SMFA/doc/testcases/addHatoms). NOTE: We must change the number of atoms at the top of this file from 446 to 447. The structure in this region now looks like this where the hydrogen atom number 447 has been located in a plausible position. 83 7.14 Input checks 5kbo This is a small protein of 2991 atoms, downloaded from the RSC Protein DataBase, with ID 5kbo. The downloaded file, 5kbo.cif, contained 20 structures, and in this example, we use the first of these, denoted 5kbo_1.xyz (in SMFA/doc/testcases/inputchecks/5kbo. We completed the input and ran main menu choice 2) SMFA examines the input and prints comments or queries to OUT_SMFA If you exit SMFA and look at the OUT_SMFA file, you will see two points of interest. (i) SMFA identified two H--S bonds that are abnormally short, and supplies the atom numbers for these atoms. If you use a molecular graphics program to look at the structure in these two locations, you will see that the S--H bonds are short, but probably not unphysically short, so no action is necessary. (ii) SMFA lists many charged groups which it has identified (61 in fact). If you look at the first group in a graphics program, you will see that a H atom has been omitted from the stucture at a nitrogen atom (a --NH group is not plausible under the normal conditions wherein this structure was obtained experimentally). Hence, to get physically sensible calculations with this structure, you will need to add a hydrogen at this nitrogen, using Utility 5.8. 5tp6 This is a small protein of 2769 atoms, was downloaded from the RSC Protein DataBase, with ID 5tp6. The downloaded file, 5tp6.cif, contained 20 structures, and in this example, we use the first of these, denoted 5tp6_1.xyz (in SMFA/doc/testcases/inputchecks/5tp6. We completed the input and ran main menu choice 2) SMFA examines the input and prints comments or queries to OUT_SMFA If you exit SMFA and look at the OUT_SMFA file (also in SMFA/doc/testcases/inputchecks/5tp6), you will see that despite having a net charge of -13, there are no warnings of unusual bonding, and all the (many) charged groups are sensible. The structure is physically reasonable. 5ce4 This is a small protein of 2690 atoms, was downloaded from the RSC Protein DataBase, with ID 5ce4. The downloaded file, 5ce4.cif, was converted to "xyz" format using the obabel program, in file SMFA/doc/testcases/inputcheck/5ce4/. We completed the input and ran main menu choice 2) SMFA examines the input and prints comments or queries to OUT_SMFA 84 If you look at the resultant output in OUT_SMFA (contained in SMFA/doc/testcases/inputcheck/5ce4/), you will see that SMFA has reported a number of anomalies in the bonding, in which possibly unphysical atom-atom distances have been found. If you look at the structure in a graphics program, near the atom numbers indicated in OUT_SMFA, you will see several instances in which a carbon atom and an oxygen atom (inter alia) have been nearly superimposed in this structure. Clearly the structure is unphysical (due to reasons unknown) and should not be used for electronic structure calculations. 85 8. Citation If SMFA is used in publications, you should cite The SMFA program for quantum chemistry calculations on large molecules, R. Kobayashi, M. A. Addicoat, A. T. B. Gilbert, R. Amos and M. A. Collins*, in preparation. * Corresponding author. 86 Appendices Appendix A. Installing the code The code is available at https://github.com/mickcollins/SMFAPAC A README.md file is contained in the SMFAPAC/ directory with instructions for installing the code. For later reference, the contents of this README.md file are reproduced here. # SMFA SMFA is a general program package for performing quantum chemistry calculations on large molecules, using an energy-based fragmentation approach. The program can calculate electronic energies, energy gradients and second derivatives; perform geometry optimization; find first order saddle points (transition states); perform energy optimized scans along a user-defined path; and evaluate various molecular properties. The program can use any of the following quantum chemistry packages: GAMESS(US), GAUSSIAN, NWChem and Q-Chem. In addition, SMFA provides a number of utility programs that, inter alia, calculate vibrational frequencies and infrared spectra with isotopic substitutions, the electrostatic potential on the solvent-accessible-surface, and isodesmic and higher order near-iso-energetic reaction schemes. Calculations of the electronic energy and related properties can be carried out using a scheme that provides a computation time that is linearly dependent on the size of the molecule or, if the user has enough processing units available, in a computation time that is independent of the size of the molecule. ### Table of contents: * * * * * [Requirements](#requirements) [Installation](#installation) [SMFA Publications](#smfa-publications) [Examples](/doc/testcases) [Licensing](#licensing) ## Requirements * Fortran compiler (gfortran, intel-fc) * [cmake](https://cmake.org/) 87 * [perl](https://www.perl.org/) * [DALTON](http://daltonprogram.org/) * At least one of the following Quantum Chemistry programs: - [GAMESS](http://www.msg.ameslab.gov/gamess/) - [Gaussian](http://gaussian.com/) - [NWChem](http://www.nwchem-sw.org/) - [QChem](http://www.q-chem.com/) * System running the PBS scheduling software The SMFA_Users_Guide.pdf (Section 2.7) in SMFAPAC/doc contains information about how to get SMFA to "load" each (at least one) of these quantum chemistry packages. Several Perl modules are also required, and these can be installed with the following commands (administrator privileges required): ```shell > sudo cpan App::cpanminus > sudo cpanm Shell > sudo apt-get install libncurses5-dev > sudo cpanm Curses ``` One of the optional utility programs in SMFA requires the openbabel program (see Section 5.2 in SMFA_Users_Guide.pdf), so you will need to install openbabel if you want to use this feature. You can install openbabel at any time (the build below does not require it). ## Installation ```shell > git clone https://github.com/mickcollins/SMFAPAC > mkdir build > cd build > cmake ../ > make install ``` `make install` compiles the binaries and also moves them to the SMFAPAC/exe directory. If you need to rebuild, simply remove all files in the build directory and `make install` again. The SMFAPAC/bin directory must be on the user's path, which can be achieved by adding the following to the appropriate rc file: For ~/.cshrc: 88 ```shell > set path = ( $path /path/to/SMFAPAC/bin) ``` For ~/.bashrc: ```shell > export PATH=$PATH:/path/to/SMFAPAC/bin ``` Ensure /path/to is replaced with the actual path to the directory. ## SMFA Publications 1. The user guide can be found in doc/SMFA_Users_Guide.pdf and contains detailed instructions on how to use the package. 2. Collins, M. A. Physical Chemistry Chemical Physics 2012, 14, 7744–7751. ## Licensing ## Version 1.0rc1 89 Appendix B. The parameter dtol In SMFA, the energy of a molecule is given by E= N frag ∑ c E (F ) n (A) n n=1 ( ) ( ) +∑ ∑ cij ⎡⎣ E Fi (1) Fj(1) − E ( Fi (1) ) − E Fj(1) ⎤⎦ i j (B) (B.1) + E pert The first term (A) is the dominant contribution, arising from the energy of the fragments evaluated for a given value of the parameter Level. For a given value of Level, every functional group Gi in the molecule will be in at least one fragment with any group Gk that is separated from Gi by no more than Level bonds. As these "bonded" interactions between groups contribute strongly to the energy of the molecule, term (A) is the largest component of the rhs of (B.1). However, there may be groups that are relatively close in space to some group Gi even though they may be well separated from Gi in terms of bonded connections. These "nonbonded" interactions can be significant in energy on the scale that is important for chemistry, and must be accounted for. Term (B) and the third term Epert account for these effects. These bonded interactions are treated as follows: The molecule is fragmented at Level = 1. The fragments are pairs of adjacent groups and single groups. Each of these Level =1 fragments is then interacted with every other fragment in the Level = 1 set of fragments, except that interactions between groups that have already been accounted for in term (A) are discarded. Most of these interactions between Level = 1 fragments are very small because they are well separated in space. However, we define d: d= the minimum atom-atom distance the sum of the Van der Waals radii for the two atoms (B.2) for each pair of Level = 1 fragments. Then, if d ≤ dtol (B.3) the interaction between these fragments is evaluated as in term (B). The coefficients cij arise from the coefficients of each fragment in the Level = 1 fragmentation. Conversely, if d > dtol , then the interaction between these fragments is evaluated (from first principles) using perturbation theory. The sum of all such interactions is given by Epert. These perturbations include electrostatic interactions, dispersion and induction. The details are contained in Ref. 10. The derivatives, with respect to the atomic coordinates, of both terms A and B are obtained from the electronic structure calculations. The corresponding derivatives of Epert are only available approximately. See Ref. 14 for details. Not surprisingly then, tests [14] show that the energy derivatives 90 are more reliably accurate for larger values of dtol. Nonetheless, values of dtol as small as 1.1 return quite accurate derivatives. There are a number of practical reasons for using a modest value of dtol (say 1.1 - 1.5). First, the number of ab initio calculations in term B increases rapidly with increasing dtol. Secondly, term B will be subject to "basis set superposition" error (BSSE) if small basis sets are used. (Term A does not appear to be subject to significant BSSE, due probably to the larger sizes of the fragments and the overlaps between fragments.) Epert is not subject to BSSE. Unless high accuracy is required for gradients and hessians, dtol ≈ 1.1 - 1.5 is best. In fact, dtol = 1.1 is normally recommended. Finally, as the value of Level increases, the number of non-bonded interactions for which (B.3) holds declines, and the total nonbonded contribution to the energy declines in magnitude. There is one exception to this general recommendation. When Level = 2, each group only has "non-bonded" interactions with groups that are separated by three bonds (Level = 2 fragments only contain next-nearest neighbour interactions between groups). The capping hydrogens that are used to terminate the Level = 1 fragments are relatively close to capping hydrogens on a group that is only three bonds distant. Term B then contains spurious interactions between capping hydrogens that are often less than 2.5 Å apart. These spurious interactions can be avoided by setting dtol = 0, so that term B is avoided, and we rely only on Epert. Term A alone for Level = 2 rarely provides an adequate approximation to the energy, except for relatively straight chain molecules. However, it is useful to evaluate the energy with Level = 2 as part of the attempt to see convergence with increasing values of Level. For Level = 2, one might as well use dtol = 0. 91 Appendix C. Structure optimization methods (i) Optimisation algorithm The geometry optimisation (minimum energy) and transition state search are carried out by the program ROGEROPT within SMFA. The geometry optimisation is carried out in Cartesian coordinates. The displacement at each step in the energy minimisation is evaluated using the "rational function optimisation (RFO)" approach [Ajit Banerjee, Noah Adams, Jack Simonsand Ron Shepard, J. Phys. Chem. 1985,89, 52-57; Adam B. Birkholz1 , H. Bernhard Schlegel, Theor Chem Acc (2016) 135:84] The hessian is evaluated ab initio only at the initial geometry. This hessian is evaluated at the Hartree Fock level, using the basis set requested at input. At each subsequent geometry, an approximate "updated" hessian is calculated using the "flowchart" method described in Section 2.2 of A. B. Birkholz and H. B. Schlegel, Theor Chem Acc (2016) 135, 84. A projection operator is used to remove translational and rotational components from the updated hessian. The energy gradient at each step of the optimization is evaluated using the level of theory and basis set which were requested at input. The eigenvalues of the "augmented Hamiltonian" [see Eq.(7) of A. B. Birkholz and H. B. Schlegel, Theor Chem Acc (2016) 135, 84] are shifted up by 0.05 plus the lowest eigenvalue. The calculated displacement at each step is moderated by a "trust radius". If the norm of the calculated displacement exceeds the trust radius, the displacement is scaled to have a norm equal to the trust radius. The trust radius is initialized with a value of 0.3 Bohr. At each subsequent step, we compare the expected change in energy from the last step with the actual change. If the next calculated displacement exceeds the trust radius and the expected change in energy from the last step is within 25% of the actual change, then the trust radius is increased by 50% before adjusting the norm of the displacement. Alternatively, if the next calculated displacement exceeds the trust radius and the expected change in energy from the last step is NOT within 25% of the actual change, then the trust radius is reduced by a factor of 2 before adjusting the norm of the displacement. Convergence: At each step in the optimisation, SMFA measures the average magnitude of the elements of the gradient vector (Hartrees/Bohr), the magnitude of the largest component of that vector, the average magnitude of the elements of the next displacement vector (Bohr) and magnitude of the largest component of that vector. Each of these quantities is taken to satisfy a convergence criteria when less than 0.00015, 0.0004, 0.0012, and 0.0018, respectively. When any three of these criteria are met, the geometry is taken to be converged. These convergence criteria were defined by reference to 92 standard quantum chemistry packages. Constraints: Bond lengths, bond angles, and dihedral angles can be "constrained" during an optimisation. These are not strict constraints in SMFA. SMFA includes a penalty function, P, which is added to the total electronic energy. It is the sum of these two components which is minimised. The penalty function is given by P: ( ) ( ) 2 1 bonds 1 angles 0 2 P = ∑ pr ⎡⎣ ri − ri ⎤⎦ + ∑ pθ ⎡⎣ cos θ j − cos θ 0j ⎤⎦ 2 i 2 j + 2 2 1 dihedrals pϕ ⎡⎣ cos(ϕ k ) − cos(ϕ k0 ) ⎤⎦ + pϕ ⎡⎣sin(ϕ k ) − sin(ϕ k0 ) ⎤⎦ ∑ 2 k where we have set pr = pθ = 1 Hartree, pϕ = 0.5 Hartree. ri0 , θ 0j and ϕ k0 denote the "preferred" values of the bond lengths, angles and dihedral angles, respectively. The dihedral angles are defined as follows. Using the definitions of atom numbers 1, 2, 3, and 4, and the vectors r12, r 23, and r 34, we define the vectors A = r34 × r23 , B = r23 × r12 , C = r23 . Then cos(ϕ ) = A•B , A B sin(ϕ ) = C • ( A × B) A BC ϕ is defined as arccos[cos( ϕ )] if sin( ϕ ) > 0, and - arccos[cos( ϕ )] if sin( ϕ ) < 0. (ii) Scan algorithm 93 The scan program simply executes the optimisation algorithm for a sequence of constraints. The optimised geometry at one step along the sequence of constraints is the initial configuration at the next step. (iii) Finding a saddle point (TS) First order saddle points are found using the algorithm of J. M. Bofill, Journal of Computational Chemistry, Vol. 15, No. 1, 1-11 (1994), denoted B1994. The hessian is evaluated at the initial geometry (using Hartree Fock), and updated at each subsequent step using Eq. (10) of B1994. At each step, the molecular geometry is updated using the algorithm denoted by processes (a) to (k) in B1994 [see text below Eq. (21) in B1994]. Appendix D. Implementation of property and property gradient methods Eqns (5.5.2) and (5.5.3) are reproduced here: P= N frag ∑ c P(F ) n (A) n n=1 ( ) ( ) +∑ ∑ cij ⎡⎣ P Fi (1) Fj(1) − P ( Fi (1) ) − P Fj(1) ⎤⎦ i ∂P = ∂xα j N frag ∑c n n=1 ∂P ( Fn ) ∂xα ( (D.1) (B) (A) ) ( ) ( )⎤⎥ ⎡ ∂P Fi (1) Fj(1) ∂P ( Fi (1) ) ∂P Fj(1) +∑ ∑ cij ⎢ − − P Fj(1) ∂xα ∂xα ∂xα i j ⎢⎣ ⎥⎦ (D.2) (B) where the property P (which may be simply the energy) is calculated by the ab initio program for each fragment. ⎧ ∂P ⎫ It may be that the ab initio package outputs the value of P and/or the derivatives, ⎨ ⎬ , in some file ⎩ ∂xα ⎭ other than the usual "log" file; for example in a checkpoint file. If so, you must include instructions in the 94 ab initio input for this file to be created. If necessary, you must arrange for this file to be readable. In the instructions below, we will assume that the output is in the "log" file. The quantum chemistry output for the fragment Fn (see Eq. D.1) is in file FRAGn.log. The coefficient cn is the nth integer in file signs.out. The quantum chemistry output for the fragment Fi(1) (see Eq. D.1) is in file nb.i.0.log. The quantum chemistry output for the fragment Fi(1) Fj(1) (see Eq. D.1) is in file ab.i.j.log. The coefficient cij (see Eq. D.1) is in ab.i.j.log immediately following the characters "Isg_coeff=". ⎧ ∂P ⎫ The format in which P or ⎨ ⎬ is written in these log files varies from one quantum chemistry package ⎩ ∂xα ⎭ to the next, and unfortunately, even varies within one quantum chemistry package depending on what quantum chemistry method was used. So, be warned, finding the data you want may depend on whether it was calculated at HF or MP2 or CCSD or some other method. Authors of such programs have apparently not bothered with a consistent output style. ⎧ ∂P ⎫ Perl scripts (very amateurish ones) to extract data for P or ⎨ ⎬ can be found in the files ⎩ ∂xα ⎭ SMFAPAC/bin/ SMFA_gam.pl, SMFA_gau.pl, SMFA_nwc.pl and SMFA_qch.pl for the four quantum chemistry programs supported by SMFA. For example, within SMFA_gau.pl the subroutine extract_gau extracts quantities like the fragment atomic coordinates, the energy, the forces and hessians from the FRAGn.log, ab.i.j.log and nb.i.0.log files, using other subroutines like getcoords_gau. You will need to write scripts analogous to extract_gau and getcoords_gau for the property of interest. In the case of simple properties, P, you need only evaluate the product cn P(FRAGn.log) and add up the results to get term A in (D.1), and evaluate the product cij [ P(ab.i.j.log) - P(nb.i.0.log) - P(nb.j.0.log)] and add up the results to get term B in (D.1). Derivative properties Evaluating Eq. (D.2) requires you to know how the coordinates in the various fragments correspond to the coordinates in the whole molecule. For example, the first two atoms in FRAG1.log might be atoms 235 and 57 in the whole molecule. Moreover, the fragments will contain "capping" hydrogen atoms which are not present in the whole molecule, and the values of ∂P associated with these ∂xα caps must be assigned to atoms in the whole molecule. The positions of the caps are determined by the positions of the atoms, n1 and n2, in the bond that was broken to form the fragment: 95 xα (cap) = (1− f )* xα (n1) + f * xα (n2) (D.3) where the factor f was determined by SMFA from the covalent radii of the atoms [Elemental_Radii.xlxs from https://ccdc.cam.ac.uk]. Term A in (D.2) ⎧ ∂P(FRAGn.log) ⎫ For quantities like ⎨ ⎬ , all the additional information you will need to evaluate term A in ∂xα ⎩ ⎭ (D.2) is contained in the file frags.out. This file contains the number of FRAGn.log files and then the number of atoms from the whole molecule that are in each fragment, call it nat0(n). The identity of these atoms is then listed for each fragment; showing, for example, that atoms 1, 2, 3... in fragment k are atoms 23, 164, 15 etc in the original molecule. frags.out then lists the data you need for Eq. (D.3). For example, after the lists of atoms in each fragment, frags.out contains a section that looks like this: The capping atoms are 1 1 1 2 2 3 ......... ......... attached to 21 13 18 16 18 19 5 7 5 6 7 8 0.669117648347851 0.669117648347851 0.669117645279965 0.669117645279965 0.669117645279965 0.669117645279965 The first number in each line refers to the fragment number, FRAG1, FRAG2 etc. In this example there are 3 caps for fragment 1, 2 caps for fragment 2, etc. The second and third numbers in each line refer to the atoms in the original molecule and the last number is the factor f. The hydrogen caps are always the last atoms in the fragment. Lets take fragment 2 as an example. Say fragment 2 has nat0(2) "real atoms", and in the example above, it has two caps. Then, the gradient of the property for the nat0(2) + 1 atom in the second fragment must be assigned as follows (as you might write the code) ∂P ∂P ∂P(FRAG2.log) = + sign(2)*(1− 0.669117..)* ∂xα (5) ∂xα (5) ∂xα (nat0 ( 2 ) + 1) ∂P ∂P ∂P(FRAG2.log) = + sign(2)* 0.669117...* ∂xα (7) ∂xα (7) ∂xα (nat0 ( 2 ) + 1) (D.4) where sign(2) refers to the sign for the second fragment (see signs.out). You can now evaluate term A in Eq (D.2). An example of this whole process can be found in SMFA_gau.pl where subroutine extract_gau gets the fragment coordinates, energy, forces and hessians from the FRAGn.log files and concatenates the data to a file call FragDerivatives. The fortran program SMFA/src/Derivs_smfa then uses frags.out 96 and signs.out to allocate the energy gradients and hessians from the fragments to the atoms in the whole molecule. Term B in (D.2) Again, the coefficient cij (see Eq. D.1) is in ab.i.j.log immediately following the characters "Isg_coeff=". The information required to allocate derivative data from ab.i.j.log and nb.i.0.log is located in the file OUT_Lev1_ATOMALLOCATION. The top portion of this file might look like this The number of final L1 frags from L1L1_mac.f 133 For each of 133 fragments:the number of atoms followed by the allocation of each atom 1 13 1 230 1.000000 1 243 1.000000 1 244 1.000000 1 231 1.000000 1 242 1.000000 1 210 1.000000 1 211 1.000000 1 228 1.000000 1 229 1.000000 1 235 1.000000 2 210 0.330882 209 0.669118 2 228 0.330882 234 0.669118 2 229 0.330882 232 0.669118 2 6 1 8 1.000000 1 15 1.000000 1 16 1.000000 1 17 1.000000 1 7 1.000000 2 7 0.264706 6 0.735294 3 6 1 198 1.000000 1 205 1.000000 ...... ...... This tells us that 133 files of the type nb.i.0.log are relevant to Eq.(D.2). Line 5 1 13 tells us that the Level = 1 fragment number 1, that gives rise to nb.1.0.log, has 13 atoms. The first ten lines below that (beginning with 1) tell us the atom numbers in the molecule corresponding to the first ten atoms associated with nb.i.0.log. Hence there are contributions to the property derivatives from nb.1.0.log such as that from the first atom: 97 ∂P ∂P ∂P(nb.1.0.log) , for all values of j, = − c1 j * ∂xα (230) ∂xα (230) ∂xα (1) and so on. The next three lines above (beginning with 2) identify hydrogen caps in the fragment and the atoms in the molecule that determine their positions. Hence, the 11th atom associated with nb.i.0.log. contributes to the derivative of the property with respect to two atoms in the original molecule: ∂P ∂P ∂P(nb.1.0.log) = − c1 j * 0.330882 * ∂xα (210) ∂xα (210) ∂xα (11) ∂P ∂P ∂P(nb.1.0.log) = − c1 j * 0.669118 * ∂xα (209) ∂xα (209) ∂xα (11) , for all values of j. So it continues for the other two caps. Then the next line tells us that the second Level = 1 fragment, that gives rise to nb.2.0.log, has 6 atoms, including one cap. And so it goes on. This data describes the atom allocations for all relevant nb.i.0.log and nb.j.0.log files and so also describes the atom allocations for all ab.i.j.log files, as the order of the atoms in ab.i.j.log is that of nb.i.0.log followed by nb.j.0.log. As an example, the subroutine extract_gau in SMFA_gau.pl creates files called abforces and abhessians which contain the energy gradients and hessians from all the ab.i.j.log, nb.i.0.log and nb.j.0.log files. The fortran program ABNBderivatives.f (with the executable stored as ABNBderivatives) reads this data and allocates the derivative data as indicated above. This completes the description of terms A and B in both (D.1) and (D.2). Higher order derivatives of properties (eg the hessian of the energy) can be constructed by considering the higher derivatives of D.2, and using the data in frags.out, signs.out and OUT_Lev1_ATOMALLOCATION. 98 Appendix E. Use of perturbation theory The energy, Epert, first referred to in Eq.(5.5.1) has been described in great detail in the papers listed in the References. It is composed of electrostatic, induction and dispersion interactions that are not accounted for in the first two terms in Eq. (5.5.1). Based on the value of the parameter dtol, these are interactions between parts of the molecule that are well separated from each other in terms of both bonded connectivity and distance. The approach to approximating these interactions has evolved over time. Two different approaches are adopted by SMFA: (i) If the molecule has formally charged groups, or if the user adopts the use of embedded charges to describe polar solvent molecules, then we use Method1; (ii) in the absence of such charges, we use Method2. The interaction of parts of the molecule at long distances is comprised of electrostatic interactions, the associated induction effect, and dispersion. Method1 If there are formal charges in the molecule, or if polar solvent molecules are present, then induction makes a significant contribution to the molecular energy, even when interactions take place over long distances. In this case, the charge distribution of all formally charged groups and all solvent molecules are evaluated. For GAUSSIAN and Q-Chem this is achieved using the natural population analysis (NPA) approach [A. E. Reed and F. Weinhold, J. Chem. Phys. 78, 4066–4073 (1983); A. E. Reed, R. B. Weinstock, and F. Weinhold, J. Chem. Phys. 83, 73 (1985)]. For NWChem, NPA charges are not available, and charges on the atoms in these groups are evaluated using NWChem's "ESP" approach. The calculation of energies for all fragments (bonded and nonbonded) are carried out in the presence of "embedded" or "background" charges on those atoms (in charged groups or solvents) which are not involved in the fragment. GAMESS(US) does not permit the use of embedded charges, so Method 1 is not employed for GAMESS(US). The use of embedded charges accounts quite well for both the electrostatic interaction of these charged groups (or solvent molecules) with the rest of the molecule, and accounts well for induction which is large in such systems. Electrostatic interactions also occur between formally neutral groups at a distance. These interactions are evaluated using distributed multipole moments on all the atoms, as previously described [10] for GAMESS, GAUSSIAN and Q-Chem. For NWChem, distributed charges only are employed, obtained from NWChem's ESP approach. For GAMESS(US), the distributed multipole moments are only available at the SCF level of ab initio theory. Hence, even though the other terms in Eq. (5.5.1) might be evaluated 99 using MP2 or CCSD or other correlated method, the long range electrostatics are evaluated with SCFbased charge distributions for these proograms. Similarly, the electron density at post-SCF levels is not made available by Q-Chem, so the long range electrostatics are evaluated with SCF-based charge distributions only for Q-Chem. GAUSSIAN provides the correlated electron density for MP2 and coupled cluster methods as well as for SCF. Hence, the best available density is used for the long range electrostatics when GAUSSIAN is employed. Finally, if the ab initio method employed accounts for dynamic electron correlation at a distance, then there is a dispersion interaction between such groups. This effect is normally small or relatively small molecules, but can be very significant in cases of high mass density (eg in crystals such as diamond [2]) or folded proteins. These dispersion interactions between groups at a distance are evaluated as previously described.[10] The DALTON program is necessary to provide the ab initio calculation of the imaginary frequency polarizability needed to evaluate these interactions. Method2 If there are no formally charged groups and no use of embedded charges for solvent molecules, or if GAMESS is used, then Method2 is adopted. The long range electrostatic interactions for all well-separated groups are evaluated using the method outlined in Method1. Induction (a small effect if there are no significant charge distributions) is accounted for perturbatively using the static polarizability of each group (evaluated ab initio), as previously described.[10]. DALTON is also used to evaluate these polarizabilities when either GAMESS or NWChem are otherwise employed. Long range dispersion interactions are evaluated as previously described.[10] 100 Appendix F. Additional implementation details Hybrid Fragmentation Scheme The systematic molecular fragmentation by annihilation (SMFA) algorithm has been modified to reduce the computational time. The original SMFA algorithm [M. A. Collins, Phys. Chem. Chem. Phys. 14, 7744–7751 (2012)] is employed in conjunction with a method which was originally developed to update a fragmentation as the molecular structure changes [M. A. Collins, J. Phys. Chem. A, 120, 9281−9291 (2016)]. The original SMFA algorithm is slow whenever the molecular geometry involves dense connections (chemical bonds or hydrogen bonds) between functional groups. In most molecules, each functional group is connected to two other groups (as in a chain) with occasional branches, in which some groups are connected to three or four other groups. By contrast, in crystalline or liquid water, for example, almost all functional groups are connected to four other groups. In the latter case, the original SMFA algorithm is slow. Similarly, many protein structures are densely connected due to a high number of hydrogen bonds. While executing the SMFA algorithm for highly connected structures, very many large fragments persist over long cycles of the repetitive algorithm, in which groups are sequentially eliminated. This greatly slows down the reduction of the set of fragments to the final (relatively small) set. To significantly speed up the fragmentation of a given structure, we simply "eliminate" bonds (pretend they don't exist) that make some functional groups highly connected. [Aside: If a bond exists between two functional groups that both have more than two connections, then that bond is "eliminated"]. The resultant geometry, with reduced bonding, is then fragmented (relatively rapidly) using the SMFA algorithm. The "eliminated" bonds are then re-established as follows. We use the fact that no final fragment can contain groups that are separated by more than Level bonds. Consequently, the role of a functional group in the set of fragments cannot depend on the presence or absence of a 101 bond that is more than Level bonded connections away from that group. To re-establish a "neglected" bond, we simply add one new fragment to the set and subtract one fragment from the set. The added fragment contains all groups that are within Level bonds from the neglected bond, with the current set of connections plus the "neglected" bond. The fragment subtracted from the set contains the same groups and connections, except for the "neglected" bond. The set of connections between groups is updated to contain the previously neglected bond. If there was more than one neglected bond, then the procedure above is simply repeated until all bonds have been "restored" to the structure. This addition and subtraction approach is described in more detail in [M. A. Collins, J. Phys. Chem. A, 120, 9281−9291 (2016)]. The SMFA algorithm is then reapplied to the full set of fragments (the original SMFA fragments plus the additions and subtractions). Cancellations between new and old fragments leads to the same set of final fragments as would have been obtained if the original SMFA procedure had been applied without any adjustments to the bonding. However, this composite procedure is much faster because very large fragments are not retained for long periods during the SMFA procedure itself. 102 Appendix G. Quantum Chemistry Foibles (a) GAUSSIAN post CCSD/QCISD electrostatic calculations GAUSSIAN only calculates the electron density for some post-SCF methods (eg MP2, CCSD). It does not allow calculation of the post-SCF density for CCSD(T), for example, so only the SCF density is available. SMFA uses the density to evaluate the charge distribution in a molecule and hence to calculate the "long range electrostatic component of" the energy, as reported in the OUT_SMFA file. Therefore, if you do a CCSD(T) calculation, the OUT_SMFA file reports a CCSD(T) value for the ab initio energy under the heading "This energy is composed of an ab initio component of" but only reports a SCF value for the long range electrostatic component. The best available approximation to the correct CCSD(T) energy would be to perform both CCSD and CCSD(T) calculations, and to combined the CCSD(T) ab initio component with the values of the "long range electrostatic component" and "induction energy" (if given) from the CCSD calculation. However, the improvement in accuracy may be minimal. (b) Post Hartree Fock electrostatic calculations with Q-Chem and GAMESS The electron density is only available at the SCF level for Q-Chem. SMFA uses the density to evaluate the charge distribution in a molecule and hence to calculate, via a distributed multipole analysis, the "long range electrostatic component of" the energy, as reported in the OUT_SMFA file. GAMESS also uses the SCF density to evaluate the distributed multipoles. Therefore, the long range electrostatic interaction energy (and the induction energy, if appropriate) is only evaluated at the Hartree Fock level for Q-Chem and GAMESS. (c) Electrostatic calculations with NWChem For NWChem, as noted elsewhere in this manual, the charge distribution in a molecule is only evaluated as a sets of point charges on each atom (via a natural population analysis). Hence, the electrostatic interaction energy (and the induction energy, if appropriate) is only evaluated via an interaction of these charges. (d) Point charges and induction 103 When a molecule contains formal charges, or when a very polar solvent is present, the induction energy may be significant. SMFA accounts for this induction effect for GAUSSIAN, NWChem and QChem, by carrying out the quantum chemistry calculations of the fragments in the presence of embedded charges [14]. These charges are evaluated using the natural population analysis charges on each functional group in the molecule which are themselves evaluated in the presence of the point charges on all other groups (iterated three times). GAMESS does not allow for embedded (partial) charges in quantum chemistry calculations, so in this case induction is always estimated using perturbation theory [10]. For systems containing numbers of formally charged groups, carrying out the electronic structure calculations in the presence of embedded charges has been found to provide a more reliable estimate of the electronic energy. (e) Dalton Polarizabilities In some situations, the static polarizability of each functional group must be calculated for use in the estimation (by perturbation theory) of the long range induction and dispersion energies. For GAMESS and NWChem, these calculations are actually carried out using the DALTON program. Unfortunately, if the function group is a doublet (spin multiplicity = 2), DALTON crashes for MP2 calculations. Hence, for doublets, we have made the approximation of replacing the MP2 polarizability by the Hartree Fock polarizability (for groups with spin multiplicity = 2, only). (f) Loading Quantum Chemistry Packages The somewhat cumbersome instructions in Section 2.7 for "loading" or otherwise making accessible the quantum chemistry packages is due to the fact that we found GAMESS(US) and NWChem do not run correctly on our system if the DALTON module is loaded at the same time (for reasons unknown). 104 References (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) (16) (17) (18) (19) (20) (21) Netzloff, H. M.; Collins, M. A. Ab initio energies of non-conducting crystals by systematic fragmentation. J. Chem. Phys. 2007, 127, 134113. Collins, M. A. Ab initio lattice dynamics of nonconducting crystals by systematic fragmentation. Journal of Chemical Physics 2011, 134, 164110. D’Arcy, J. H.; Jordan, M. J. T.; Frankcombe, T. J.; Collins, M. A. H2 Adsorption in a Porous Crystal: Accurate First-Principles Quantum Simulation. J. Phys. Chem. A 2015, 119, 12166−12181. Frankcombe, T. J.; Collins, M. A. Potential energy surfaces for gas-surface reactions. Physical Chemistry Chemical Physics 2011, 13, 8379. Frankcombe, T. J.; Collins, M. A. Growing Fragmented Potentials for Gas-Surface Reactions: The Reaction between Hydrogen Atoms and Hydrogen-Terminated Silicon (111). J. Phys. Chem. C 2012, 116, 7793. Frankcombe, T. J.; Collins, M. A.; Zhang, D. H. Modified Shepard interpolation of gassurface potential energy surfaces with strict plane group symmetry and translational periodicity. J. Chem Phys. 2012, 137, 144701. Collins, M. A. Can Systematic Molecular Fragmentation Be Applied to Direct Ab Initio Molecular Dynamics. J. Phys. Chem. A 2016, 120, 9281. Deev, V.; Collins, M. A. Approximate ab initio energies by systematic molecular fragmentation. J. Chem. Phys. 2005, 122, 154102. Collins, M. A.; Deev, V. A. Accuracy and efficiency of electronic energies from systematic molecular fragmentation. J. Chem. Phys. 2006, 125, 104104. Addicoat, M. A.; Collins, M. A. Accurate treatment of non-bonded interactions within systematic molecular fragmentation. J. Chem. Phys. 2009, 131, 104103. Collins, M. A. Systematic fragmentation of large molecules by annihilation. Physical Chemistry Chemical Physics 2012, 14, 7744. Pruitt, S. R.; Addicoat, M. A.; Collins, M. A.; Gordon, M. S. The fragment molecular orbital and systematic molecular fragmentation methods applied to water clusters. Physical Chemistry Chemical Physics 2012, 14, 7752. Reid, D. M.; Collins, M. A. Molecular electrostatic potentials by systematic molecular fragmentation. J. Chem. Phys. 2013, 139, 184117. Collins, M. A. Molecular forces, geometries and frequencies by systematic molecular fragmentation including embedded charges. J. Chem Phys. 2014, 141, 094108. Reid, D. M.; Collins, M. A. Calculating nuclear magnetic resonance shieldings using systematic molecular fragmentation by annihilation. Phys. Chem. Chem. Phys. 2015, 17, 5314. Collins, M. A.; Cvitkovic, M. W.; Bettens, R. P. A. The Combined Fragmentation and Systematic Molecular Fragmentation Methods. Acc. Chem. Res. 2014, 47, 2776. Collins, M. A.; Bettens, R. P. A. Energy-Based Molecular Fragmentation Methods. Chem. Rev. 2015, 115, 5607. Stone, A. J. The Theory of Intermolecular Forces; Clarendon: Oxford, 1996. Stone, A. J. Distributed Multipole Analysis: Stability for Large Basis Sets. J. Chem. Theory Comput. 2005, 1, 1128. Hehre, W. J.; Ditchfield, R.; Radom, L.; Pople, J. A. Molecular orbital theory of the electronic structure of organic compounds. V. Molecular theory of bond separation. J. Am. Chem. Soc. 1970, 92, 4796. George, P.; Trachtman, M.; Bock, C. W.; Brett, A. M. An alternative approach to the problem of assessing stabilization energies in cyclic conjugated hydrocarbons. Theoret. 105 (22) (23) Chim. Acta 1975, 38, 121. Japeli, B.; Pristovsek, P.; Majerle, A.; Jerala, R. Structural origin of endotoxin neutralization and antimicrobial activity of a lactoferrin-based peptide. J.Biol.Chem. 2005, 280, 16955. Gill, R. L.; Castaing, J. P.; Hsin, J.; Tan, I. S.; Wang, X.; Huang, K. C.; Tian, F.; Ramamurthi, K. S. Structural basis for the geometry-driven localization of a small protein. Proc. Natl. Acad. Sci. USA 2015, 112, E1908. 106
Source Exif Data:
File Type : PDF File Type Extension : pdf MIME Type : application/pdf Linearized : No Page Count : 106 PDF Version : 1.4 Title : SMFA User's Guide Author : Mick Collins Subject : Producer : Mac OS X 10.9.5 Quartz PDFContext Creator : Word Create Date : 2018:11:06 02:02:49Z Modify Date : 2018:11:06 02:02:49Z Apple Keywords :EXIF Metadata provided by EXIF.tools