Reaction Free Energy Profiles from QM/MM simulations: Methyl Transfer reaction in MTR1

Erika McCarthy1, Şölen Ekesan1, and Darrin M. York1
1Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA

Learning objectives

Relevant literature

Outline

1. 1-dimensional methyl transfer

2. 2-dimensional strings

Additional exercises

  • 1.6 Demonstrating the impact of umbrella window spacing
  • 2.3 Visualizing the path progression
  • 2.4 Comparing the PMF at the semi-empirical and ab initio level
  • 1. 1-dimensional methyl transfer

    The necessary inputs and subsequent outputs for parts 1.1 through 1.6 are available in:
    /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/1D/inputs
    and
    /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/1D/ouputs.

    The inputs should be copied to your working directory. If you unable to produce an output for a given step, copy it from the output directory in order to proceed. There will be times when a calculation has been performed for you and you will do analysis. In these cases, where indicated, you will copy the outputs to your working directory and analyze them.

    To use AMBER programs, you must:
    module use /expanse/projects/qstore/amber_ws/packages/modulefiles/
    module load workshop/software

    You will also periodically need to use xmgrace to plot your results. This module conflicts with AMBER on Expanse, so it is recommended to open another Expanse window (be sure to use ssh -Y to enable the gui) and type:
    module load cpu/0.15.4 intel/19.1.1.217 grace/5.1.25
    to do the plotting tasks and avoid loading and unloading modules.

    The motivation for using umbrella sampling is that, without restraints, a system undergoing dynamics will very rarely cross a reaction barrier. Even if it does, it will be very sparsely sampled and take much longer than we can afford to sample. Therefore, we apply a series of harmonic biasing potentials, also referred to as an umbrella potential, to force the system over a free energy barrier in the space of the selected reaction coordinate. In this way, we can thoroughly sample the breaking and formation of bonds, or other processes like a conformational change. In order to construct a potential of mean force, we must track the value of the reaction coordinate throughout our simulations so that we can later remove the impact of the bias and compute what the distribution of sampling would look like had we not applied the restraint. This tutorial will show you how to perform and analyze QM/MM simulations of a methyl transfer reaction in the MTR1 ribozyme using 1 reaction coordinate.

    The active site is shown in Figure 1, and includes the methyl donor ligand (O6 methylated guanine), the nucleophillic target A63, and C10 protonated at the N3 position.

    Figure 1
    Figure 1. Active site of the MTR1 ribozyme. Atoms included in the QM region are represented as CPK. Distances in are those involved in the methyl transfer reaction coordinate with the associated atoms labeled. The labeled atoms involved in the hydrogen bond between C10 and O6mG will be addressed in part 2.

    First, we must decide which atoms will be treated quantum mechanically (aka the QM region). The rest of the atoms will be treated with the molecular mechanical force field. In this case, we are simulating methyl transfer from O6 methylated guanine (O6mG) to the N1 of A63, so we want to include both of these residues in the QM region. The positively charged C10 residue is directly hydrogen bonding to the leaving group, so we will include it as well. In part 2 this residue will also be involved in the reaction, so we will proactively include it for part 1. For A63 and C10, we will only include the nucleobases since the backbones are not directly involved in the reaction. Additionally, U45 is important for holding the ligand in the active site, but since it is not directly involved in the reaction it will remain in the MM region. Based on this selection, the total charge of the QM region is +1 because C10 is in a non-standard protonation state.

    Based on this QM region selection, one would need to make some modifications to the parameter file such as removing connection atom charges and redistributing MM charges so that the QM region within each residue is an integer. One would also need to perform hydrogen mass repartitioning in order to run QM/MM simulations stably with a 2 fs time step. These modifications are outside the scope of this exercise and have already been performed to produce qmmm.parm7 that is provided to you in the template directory.

    The inputs for this section are located in 1D/inputs. They should be copied to your working directory.

    [user@cluster] mkdir 1D
    [user@cluster] cd 1D
    [user@cluster] cp -r /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/1D/inputs/* ./

    1.1 Preparing the inputs

    You have been provided a parameter file (template/qmmm.parm7), and a structure file (start/reactant.rst7) that was previously equilibrated and represent the reactant state of the ribozyme. Additionally, you have been provided a template mdin file (template/short.mdin). This mdin file is valid for QM/MM simulations in general, and is not specific to umbrella sampling.

  • Take a look at short.mdin:
  • title
    &cntrl
    ! IO =======================================
          irest = 0       ! 0 = start, 1 = restart
            ntx = 1       ! 1 = start, 5 = restart
           ntxo = 1       ! read/write rst as formatted file
          iwrap = 1       ! wrap crds to unit cell
         ioutfm = 1       ! write mdcrd as netcdf
           imin = 0
          ntmin = 1
           ntpr = 20
           ntwr = 20
           ntwx = 0
           ntwf = 0
    ! DYNAMICS =================================
         nstlim = 250
             dt = 0.002   ! ps/step
            ntb = 1       ! 1=NVT periodic, 2=NPT periodic, 0=no box
    ! TEMPERATURE ==============================
          temp0 = 298     ! target temp
       gamma_ln = 5.0     ! Langevin collision freq
            ntt = 3       ! thermostat (3=Langevin)
    ! PRESSURE  ================================
            ntp = 0       ! 0=no scaling, 1=isotropic, 2=anisotropic
    ! SHAKE ====================================
            ntc = 2       ! 1=no shake, 2=HX constrained, 3=all constrained
    noshakemask = ":69|@272-282,290-291,1975-1988"    ! do not shake these
            ntf = 1       ! 1=cpt all bond E, 2=ignore HX bond E, 3=ignore all bond E  
    ! MISC =====================================
            cut = 10.0
          ifqnt = 1
             ig = -1
         nmropt = 1
    /
    
    
    &ewald
      dsum_tol = 1.e-6
    /
    
    &qmmm
        qm_theory   = 'DFTB3'
            qmmask  = ':69|@272-282,290-291,1975-1988'
          qmcharge  = 1
              spin  = 1
           qmshake  = 0
          qm_ewald  = 1
       qmmm_switch  = 1
           scfconv  = 1.e-10
         verbosity  = 0
      tight_p_conv  = 1
      diag_routine  = 0
       pseudo_diag  = 1
      dftb_maxiter  = 100
    /
    
    
    &wt type = 'DUMPFREQ', istep1 = 10, /
    &wt type='END' /
    DISANG=img.disang
    DUMPAVE=img.dumpave
    LISTOUT=POUT
    LISTIN=POUT

    As highlighted in cyan, we will be using a 2 fs timestep, but this can only be done safely when hydrogen mass repartitioning has been applied to the parameter file. The initial simulations used to generate the windows for umbrella sampling will only be performed for 0.5 ps.

    As highlighted in yellow, we will be using the DFTB3 semi-empirical Hamiltonian, and the total charge of the QM region is +1 due to the protonation of cytosine. When determining the total charge of a nucleic acid QM region, be sure to consider any negatively charged phosphates that may be included. The QM region is defined by the mask ':69|@272-282,290-291,1975-1988' which includes the entire ligand (:69), the nucleobase of C10 (@272-282,290-291) and the nucleobase of A63 (@1975-1988).

    The noshakemask (pink ) contains the same mask as the QM region, thus we will not shake QM atoms as these bonds involving hydrogen may fluctuate outside of equilibrium MM values when bonds are made/broken. SHAKE constraints are inherently dependent on the MM force field, thus MM constraints should not be combined with our QM description.

    Since we want to apply nmr restraints, nmropt (green) is set to 1, and we will output restraint values to the dumpave file every 10 steps. The green highlighted img.disang and img.dumpave are place holders that will be replaced to indicate the appropriate umbrella window number in order to apply restraints.

    Let's review how nmr restraints work in AMBER. When nmropt=1 is set in the mdin file, restraints will be read from the specified DISANG file. These are different than coordinate restraints set when using ntr=1, which effectively freezes the atoms in place. Using nmr restraints allows the atom position to fluctuate subject to a specified biasing potential relating it to another atom or atoms. The values of the restrained properties for a given step are output to the specified DUMPAVE file with a frequency of istep1.

    Next we need to select the reaction coordinates. We choose the reaction coordinate to be the difference in distance between the atoms where the bond will break, and between the atoms where the bond will form. This ensures the reaction coordinate value to increase going from reactant to product making it more intuitive. In MTR1, describing methyl transfer between O6mG:O6 and A63:N1 will yield the reaction coordinate (RC): RC = |RO6mG:O6 - RCm| - | RA63:N1 - RCm |. We will evaluate the reaction going from the reactant with RC at -2.5 , to product with RC at 2.5 . You have been provided template/img.disang containing the restraint definitions.

  • Take a look at img.disang:
  • 
    #methylation: |GN-C5|-|A62-C5|
    &rst iat= 2200, 2189, 2200, 1984,
    r1=-999, r2=RC1, r3=RC1, r4=999, rk2=100, rk3=100, rstwt=1.0,-1.0 /  
    
    #Max distances for H-bonds
    
    ## 1 ##
    &rst iat= 1403, 2207,
    r1=-999, r2=-999, r3=3.00, r4=999,rk2=0, rk3=100 /
    
    ## 2 ##
    &rst iat= 1405, 2192,
    r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
    
    ## 3 ##
    &rst iat= 1407, 2206,
    r1=-999, r2=-999, r3=2.00, r4=999,rk2=0, rk3=100 /
    
    ## 4 ##
    &rst iat= 2205, 282,
    r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
    
    ## 5 ##
    &rst iat= 280, 2189,
    r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
    
    ## 6 ##
    &rst iat= 2190, 1982,
    r2=-999, r2=-999, r3=3.00, r4=999,rk2=0, rk3=100 /
    
    ## 7 ##
    &rst iat= 291, 2193,
    r1=-999, r2=-999, r3=2.20, r4=999,rk2=0, rk3=100 /
         

    For now, we need to understand more about the restraints used in this example. The first restraint (highlighted yellow) defines the reaction coordinate. The line "&rst iat= 2200, 2189, 2200, 1984," indicates that we are restraining the difference in distance between the methyl carbon (2200) to ligand oxygen (2189) and the methyl carbon (2200) to the N1 of adenine (1984). These distances are shown in Figure 1. The presence of "rstwt=1.0,-1.0" indicates this should be a difference of distances. In theory, the two distances could be seperate reaction coordinates, but combining them reduces the dimensionality while still describing the same event. The values of r1, r2, r3, and r4 define the shape of the biasing potential. The r2 and r3 values define the flat region of the potential. If r2 is less than r3, then there is no penalty for being within those bounds. When applying an umbrella potential, we set r2=r3 to sample directly on the desired value. The img.disang file is a template, where RC1 will ultimately be replaced for each umbrella window. If more reaction coordinates were present, the placeholder RC2 and so on would be used. The r1 and r4 values define the bounds of the linear response region, ie the penalty will progressively increase as the restrained property approaches r1 and r4. By setting r1=-999 and r4=999 we have effectively created a parabola with infinite bounds. The values rk2 and rk3 are the weights of the restraints. These effectively define the steepness of the parabola, thus controlling the penalty for going outside the r2 to r3 region. For umbrella sampling we want to set this to a high number to force the sampling of high energy regions. The higher the free energy at an umbrella center position, the more the observed value (reported in the dumpave file) will attempt to move away from this center and fight against the biasing potential. It is this concept that helps us construct a PMF.

    The rest of the restraints do not define reaction coordinates, and they will be held constant in all of the umbrella windows. These are distance restraints on the active site hydrogen bonds to the ligand to ensure it stays in the pocket. As highlighted in green, these are different from the harmonic potential used for the reaction coordinate because r2=-999 and rk2=0. This defines a half-harmonic biasing potential where any distance less than 3.0 feels no penalty. If the values were reversed (ie. r3=999 and rk2=100), then the two atoms would be pushed away from each other. Now that we have a comprehensive understand of the nmr restraints, we will focus on setting up an umbrella sampling simulation.

    Now we will generate the equil directory based on the inputs provided. This will be where we sequentially generate the umbrella windows starting from the reactant structure. It is important to slowly increment the reaction coordinate starting from a neighboring structure because each time you change the restraint, you are effectively changing the force field and risk large spikes in energy that could cause unintended bonds to break. We will use the ndfes program to write the inputs for this step.

  • Navigate to the start directory.
  • You have been provided a reactant and product.disang file where the reaction coordinate is set to -2.5 and 2.5 respectively. These will define the end points, and a total of 32 umbrella windows will be generated to linearly interpolate through them.

  • Make sure the workshop/software module is loaded.
  • Run the following command:
  • [user@cluster] ndfes-path-prepguess.py --disang ../template/img.disang --mdin ../template/short.mdin --min reactant.disang --min product.disang --nsim 32 --odir ../equil --pad 2 --dry-run

    The --min flags indicate the minima we want to use as control points (reactant and product). The option --disang indicates the template disang file with a placeholder, --mdin indicates the template mdin file with a placeholder, --nsim determines the number of umbrella windows, and --pad means the file names will be padded with a leading zero. The --dry-run flag will just print the output without creating the simulation directory called equil. The output should look like this:

     1    -2.500   ../equil/img01.disang
     2    -2.339   ../equil/img02.disang
     3    -2.177   ../equil/img03.disang
     4    -2.016   ../equil/img04.disang
     5    -1.855   ../equil/img05.disang
     6    -1.694   ../equil/img06.disang
     7    -1.532   ../equil/img07.disang
     8    -1.371   ../equil/img08.disang
     9    -1.210   ../equil/img09.disang
    10    -1.048   ../equil/img10.disang
    11    -0.887   ../equil/img11.disang
    12    -0.726   ../equil/img12.disang
    13    -0.565   ../equil/img13.disang
    14    -0.403   ../equil/img14.disang
    15    -0.242   ../equil/img15.disang
    16    -0.081   ../equil/img16.disang
    17     0.081   ../equil/img17.disang
    18     0.242   ../equil/img18.disang
    19     0.403   ../equil/img19.disang
    20     0.565   ../equil/img20.disang
    21     0.726   ../equil/img21.disang
    22     0.887   ../equil/img22.disang
    23     1.048   ../equil/img23.disang
    24     1.210   ../equil/img24.disang
    25     1.371   ../equil/img25.disang
    26     1.532   ../equil/img26.disang
    27     1.694   ../equil/img27.disang
    28     1.855   ../equil/img28.disang
    29     2.016   ../equil/img29.disang
    30     2.177   ../equil/img30.disang
    31     2.339   ../equil/img31.disang
    32     2.500   ../equil/img32.disang  
    dmax     0.161 

    dmax is the spacing between umbrella windows. In general, if the spacing is too larger, consider adding more windows. For our purposes this is sufficiently small.

  • Run the command again without the --dry-run flag to create the equil directory:
  • [user@cluster] ndfes-path-prepguess.py --disang ../template/img.disang --mdin ../template/short.mdin --min reactant.disang --min product.disang --nsim 32 --odir ../equil --pad 2  

    1.2 Generating the umbrella windows

  • List the contents of the equil directory:
  • [user@cluster] cd ../equil
    [user@cluster] ls 
    eq01fwd.sh    img04.disang  img07.mdin    img11.disang  img14.mdin    img18.disang  img21.mdin    img25.disang  img28.mdin    img32.disang
    img01.disang  img04.mdin    img08.disang  img11.mdin    img15.disang  img18.mdin    img22.disang  img25.mdin    img29.disang  img32.mdin
    img01.mdin    img05.disang  img08.mdin    img12.disang  img15.mdin    img19.disang  img22.mdin    img26.disang  img29.mdin    init01.rst7  
    img02.disang  img05.mdin    img09.disang  img12.mdin    img16.disang  img19.mdin    img23.disang  img26.mdin    img30.disang
    img02.mdin    img06.disang  img09.mdin    img13.disang  img16.mdin    img20.disang  img23.mdin    img27.disang  img30.mdin
    img03.disang  img06.mdin    img10.disang  img13.mdin    img17.disang  img20.mdin    img24.disang  img27.mdin    img31.disang
    img03.mdin    img07.disang  img10.mdin    img14.disang  img17.mdin    img21.disang  img24.mdin    img28.disang  img31.mdin

    You will see that the eq01fwd.sh script was generated for you in the ouputs directory.

  • Take a look at eq01fwd.sh:
  • #!/bin/bash
    set -e
    set -u
    
    #
    # You can create a slurm script and run these
    # commands in the following way:
    #
    # export LAUNCH="srun sander.MPI"
    # export PARM="path/to/parm7"
    # bash eq01fwd.sh
    #
    
    if [ "${LAUNCH}" == "" ]; then
        echo 'bash variable LAUNCH is undefined. Defaulting to: export LAUNCH="sander"'
        export LAUNCH="sander"
    fi
    
    if [ "${PARM}" == "" ]; then
        echo 'bash variable PARM is undefined. Please: export PARM="/path/to/parm7"'
        exit 1
    else
       if [ ! -e "${PARM}" ]; then
           echo "File not found: ${PARM}"
           exit 1
       fi
    fi
    
    if [ ! -e "init01.rst7" ]; then
        echo "File not found: init01.rst7"
        exit 1
    fi
    
    ${LAUNCH} -O -p ${PARM} -i img01.mdin -o img01.mdout -c init01.rst7 -r img01.rst7 -x img01.nc -inf img01.mdinfo  
    ${LAUNCH} -O -p ${PARM} -i img02.mdin -o img02.mdout -c img01.rst7 -r img02.rst7 -x img02.nc -inf img02.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img03.mdin -o img03.mdout -c img02.rst7 -r img03.rst7 -x img03.nc -inf img03.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img04.mdin -o img04.mdout -c img03.rst7 -r img04.rst7 -x img04.nc -inf img04.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img05.mdin -o img05.mdout -c img04.rst7 -r img05.rst7 -x img05.nc -inf img05.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img06.mdin -o img06.mdout -c img05.rst7 -r img06.rst7 -x img06.nc -inf img06.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img07.mdin -o img07.mdout -c img06.rst7 -r img07.rst7 -x img07.nc -inf img07.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img08.mdin -o img08.mdout -c img07.rst7 -r img08.rst7 -x img08.nc -inf img08.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img09.mdin -o img09.mdout -c img08.rst7 -r img09.rst7 -x img09.nc -inf img09.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img10.mdin -o img10.mdout -c img09.rst7 -r img10.rst7 -x img10.nc -inf img10.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img11.mdin -o img11.mdout -c img10.rst7 -r img11.rst7 -x img11.nc -inf img11.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img12.mdin -o img12.mdout -c img11.rst7 -r img12.rst7 -x img12.nc -inf img12.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img13.mdin -o img13.mdout -c img12.rst7 -r img13.rst7 -x img13.nc -inf img13.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img14.mdin -o img14.mdout -c img13.rst7 -r img14.rst7 -x img14.nc -inf img14.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img15.mdin -o img15.mdout -c img14.rst7 -r img15.rst7 -x img15.nc -inf img15.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img16.mdin -o img16.mdout -c img15.rst7 -r img16.rst7 -x img16.nc -inf img16.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img17.mdin -o img17.mdout -c img16.rst7 -r img17.rst7 -x img17.nc -inf img17.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img18.mdin -o img18.mdout -c img17.rst7 -r img18.rst7 -x img18.nc -inf img18.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img19.mdin -o img19.mdout -c img18.rst7 -r img19.rst7 -x img19.nc -inf img19.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img20.mdin -o img20.mdout -c img19.rst7 -r img20.rst7 -x img20.nc -inf img20.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img21.mdin -o img21.mdout -c img20.rst7 -r img21.rst7 -x img21.nc -inf img21.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img22.mdin -o img22.mdout -c img21.rst7 -r img22.rst7 -x img22.nc -inf img22.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img23.mdin -o img23.mdout -c img22.rst7 -r img23.rst7 -x img23.nc -inf img23.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img24.mdin -o img24.mdout -c img23.rst7 -r img24.rst7 -x img24.nc -inf img24.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img25.mdin -o img25.mdout -c img24.rst7 -r img25.rst7 -x img25.nc -inf img25.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img26.mdin -o img26.mdout -c img25.rst7 -r img26.rst7 -x img26.nc -inf img26.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img27.mdin -o img27.mdout -c img26.rst7 -r img27.rst7 -x img27.nc -inf img27.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img28.mdin -o img28.mdout -c img27.rst7 -r img28.rst7 -x img28.nc -inf img28.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img29.mdin -o img29.mdout -c img28.rst7 -r img29.rst7 -x img29.nc -inf img29.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img30.mdin -o img30.mdout -c img29.rst7 -r img30.rst7 -x img30.nc -inf img30.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img31.mdin -o img31.mdout -c img30.rst7 -r img31.rst7 -x img31.nc -inf img31.mdinfo
    ${LAUNCH} -O -p ${PARM} -i img32.mdin -o img32.mdout -c img31.rst7 -r img32.rst7 -x img32.nc -inf img32.mdinfo

    This script would start from img01 and iteratively generate the umbrella windows using the output of the previous window as the starting structure. It is important to increment the starting structure slowly to avoid errors. The serial nature of this process makes it too slow to compute during the course of this workshop, so these simulations have been conducted for you.

    1.3 Running the umbrella sampling

    After generating the windows, it it best practice to exclude the first ~2 ps of production sampling from analysis to allow for proper equilibration time. The equilibration region can be checked using ndfes-CheckEquil.py. However, under the time constraints of the exercise, you will not be able to perform long enough sampling to gather equilibrium statistics, so you will only perform enough sampling to generate a PMF.

    Now we will perform umbrella sampling on all of the umbrella windows in parallel. For the sake of of this exercise, the windows have been divided into 8 groups of 4 in directories called t1 through t8. Normally, one would run all of the windows together, but this will allow each participant to perform 3 ps of sampling in ~8 minutes. Trial directories should be chosen such that each trial is simulated by roughly an even number of people. This tutorial will focus on t1 as an example, but the instructions are the same for the other sets just replacing the directory name.

  • Choose a trial directory to work with.
  • List the contents of the t1 directory:
  • [user@cluster] ls t1
    analyze_1-4.sh  img01.mdin    img02.mdin    img03.mdin    img04.mdin   init02.rst7  init04.rst7    template.groupfile  
    img01.disang    img02.disang  img03.disang  img04.disang  init01.rst7  init03.rst7  run_1-4.slurm
    

    The init{01..04} restart files are copies of the final img{01..04} restart files generated in the equil directory. In the mdin files, nstlim has been increased to 1500 so that with a 2 fs timestep we will perform 3 ps of sampling. It is typically desirable to run longer simulations when more time is available; however, this will be enough sampling to generate and analyze a PMF as a demonstration.

    You have also been provided a slurm script called run_1-4.slurm.

  • Take a look at run_1-4.slurm:
  • #!/bin/bash
    #SBATCH --job-name="sim.1-4"
    #SBATCH --output="%a.slurmout"
    #SBATCH --error="%a.slurmerr"
    #SBATCH --partition=shared
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=32
    #SBATCH --mem=64GB
    #SBATCH --cpus-per-task=1
    #SBATCH --requeue
    #SBATCH --export=ALL
    #SBATCH -t 0-00:30:00
    #SBATCH --account=gue998
    #SBATCH --reservation=amber24
    
    module load workshop/amber24/default
    
    export LAUNCH="srun --mpi=pmi2 -K1 -N1 -n32 -c1 --exclusive sander.MPI"  
    #export PARM="../template/qmmm.parm7"
    
    set -e
    set -u
    
    
    $LAUNCH -ng 4 -groupfile template.groupfile
    wait
    sleep 1

    Each windows will be allotted 8 cores. A single compute node on expanse has 128 cores, which is conveniently divisible by 32, so a full run with 32 windows running in parallel could scale to an entire compute node. Now that the jobs are being run in parallel, we will provide a groupfile as input to sander.MPI.

  • Take a look at the groupfile:
  • -O -p ../template/qmmm.parm7 -c init01.rst7 -i img01.mdin -o img01.mdout -r img01.rst7 -x img01.nc -inf img01.mdinfo
    -O -p ../template/qmmm.parm7 -c init02.rst7 -i img02.mdin -o img02.mdout -r img02.rst7 -x img02.nc -inf img02.mdinfo
    -O -p ../template/qmmm.parm7 -c init03.rst7 -i img03.mdin -o img03.mdout -r img03.rst7 -x img03.nc -inf img03.mdinfo  
    -O -p ../template/qmmm.parm7 -c init04.rst7 -i img04.mdin -o img04.mdout -r img04.rst7 -x img04.nc -inf img04.mdinfo

    This is similar to executing multiple backgrounded (&) sander.MPI processes, but it also gives us the capability to run replica exchange if desired. Since you will only be running a portion of the umbrella windows, HREMD will not be performed.

  • Navigate to your trial directory.
  • Submit the run script:
  • [user@cluster] cd t1
    [user@cluster] sbatch run_1-4.slurm  

    1.4 Generating the PMF

  • While the simulations are running, take a look at the analyze_1-4.sh script:
  • #!/bin/bash
    
    mkdir analysis
    mkdir analysis/dumpaves
    
    cd analysis
    
    for i in $(seq 1 4); do
    
      base=$(printf "img%02i" ${i})
      ndfes-PrepareAmberData.py -d ../${base}.disang -i ../${base}.dumpave -o dumpaves/${base}.dumpave -r 1 >> metafile  
    
    done
    
    ndfes_omp --mbar -w 0.15 --nboot 0 -c metafile.chk metafile
    
    ndfes-path_omp --chk metafile.chk --ipath metafile --npathpts 4 --nsplpts 20 --opath path

    This script will create an analysis directory and use the ndfes program to generate the potential of mean force for the segment of the reaction space spanned by your 4 umbrella windows. Each simulation generates a dumpave file that tracks the values of the restrained properties, including the reaction coordinate. For each dumpave file, the program ndfes-PrepareAmberData.py will extract just the data related to the reaction coordinate and output this to a new dumpave file in analysis/dumpaves. It will also generate a file called metafile that will indicate the index of the Hamiltonian (default is zero), the temperature (default is 298), the path to the dumpave file, the umbrella center, and the force constant.

    Next, the program ndfes_omp is used to solve the MBAR equations with a bin width equal to 0.15. Setting the --nboot flag equal to zero means that we will perform no bootstrap resamples. If you wish to put error bars on your data points, nboot can be set to a nonzero number like 50, but this analysis will take longer with more resamples. The output of the program is an xml file called metafile.chk. This contains the value of the free energy in each sampled bin.

    Finally, we use the ndfes-path_omp program to print the PMF information to a file called path. In this example, we are merely asking it to print the free energy along a path that is defined by the 4 control points in the metafile (ie. where we placed the umbrella windows). To create a smooth curve, we will use 20 spline points to represent our path. When using interpolation, the default setting is to use radial basis functions (rbf). Additional options will be discussed in upcoming exercises with more complex reaction paths.

  • When the simulations are complete, run the analysis script and take a look at the ouput.
  • Consider having two terminals open where one uses the workshop software modules and one uses xmgrace.
  • [user@cluster] bash analyze_1-4.sh
    [user@cluster] ls analysis
    dumpaves  metafile  metafile.chk  path  path.rbf.0.dat  path.xml

  • Plot the path using xmgrace (preferably in another terminal):
  • [user@cluster] module load cpu/0.15.4 intel/19.1.1.217 grace/5.1.25  
    [user@cluster] cd analysis
    [user@cluster] xmgrace -block path.rbf.0.dat -bxy 3:4  

    Figure 2
    Figure 2. PMF as a function of reaction coordinate value from simulation of 4 umbrella windows.

    The y-axis is the free energy in kcal/mol, and the x-axis is the reaction coordinate value. Without the context of the other umbrella windows, this does not give us much information. Now we will look at the PMF as a whole.

    1.5 Generating the combined PMF

    In this section we will combine all of the umbrella sampling trials to construct a combined PMF. You have been provided the output of all of the trials that must be copied to the working directory.

  • Copy the remaining trial directories to your working directory.
  • Create a directory called combined_analysis.
  • List the contents of your working directory. It should be structured as follows:
  • [user@cluster] cp -r /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/1D/ouputs/t{2..8} ./
    [user@cluster] ls
    combined_analysis  t1  t2  t3  t4  t5  t6  t7  t8  

    The structures files have been removed from the output directories for the sake of file size, but these would have been generated by the simulations and are not required for analysis. We want to combine the metafiles from all of the trial directories.

  • Run the following command to generate metafile.all:
  • [user@cluster] cd combined_analysis
    [user@cluster] ndfes-CombineMetafiles.py -o metafile.all ../t{1..8}/analysis/metafile 

  • Run the following command to generate metafile.all.chk:
  • [user@cluster] ndfes_omp --mbar -w 0.15 --nboot 0 -c metafile.all.chk metafile.all  

    This will generate metafile.all.chk. Finally, we will generate the path file for the entire PMF.

  • Run the following command to generate the path file:
  • [user@cluster] ndfes-path_omp --chk metafile.all.chk --ipath metafile.all --npathpts 32 --nsplpts 400 --opath path_all

    The ndfes commands are also given in combined_analysis.sh, but for the sake of understanding the role of each program they have been run one at a time.

  • Plot the PMF with xmgrace:

    [user@cluster] xmgrace -block path_all.rbf.0.dat -bxy 3:4  

    Figure 3
    Figure 3. PMF as a function of reaction coordinate value from simulation of all 32 umbrella windows.

    Now we can see the full PMF. From this we see that the free energy barrier for this reaction is ~30 kcal/mol, which is quite high. In 1 dimension it is very clear what the path is from reactants to products, and we can plot the free energy as a function of a single coordinate. However, if more reaction coordinates are involved, the path from reactant to product is less clear. In the following sections you will see how the use of more reaction coordinates can better model this reaction.

    Additional exercise

    1.6 Demonstrating the impact of umbrella window spacing

    Only complete this section if you have remaining time in the session and are on pace to finish the required materials.

    In this section you will demonstrate the importance of the spacing between umbrella windows. You have been provided the directory test_spacing containing metafile.11.

  • Take a look at metafile.11:
  • 0 298.00 ../t1/analysis/dumpaves/img01.dumpave    -2.500000    1.00000000000000e+02
    0 298.00 ../t1/analysis/dumpaves/img04.dumpave    -2.016129    1.00000000000000e+02
    0 298.00 ../t2/analysis/dumpaves/img07.dumpave    -1.532258    1.00000000000000e+02
    0 298.00 ../t3/analysis/dumpaves/img10.dumpave    -1.048387    1.00000000000000e+02
    0 298.00 ../t4/analysis/dumpaves/img13.dumpave    -0.564516    1.00000000000000e+02
    0 298.00 ../t4/analysis/dumpaves/img16.dumpave    -0.080645    1.00000000000000e+02
    0 298.00 ../t5/analysis/dumpaves/img19.dumpave     0.403226    1.00000000000000e+02
    0 298.00 ../t6/analysis/dumpaves/img22.dumpave     0.887097    1.00000000000000e+02
    0 298.00 ../t7/analysis/dumpaves/img25.dumpave     1.370968    1.00000000000000e+02
    0 298.00 ../t7/analysis/dumpaves/img28.dumpave     1.854839    1.00000000000000e+02
    0 298.00 ../t8/analysis/dumpaves/img31.dumpave     2.338710    1.00000000000000e+02
    

    This is is the same as metafile.all used previously for combined analysis except it only contains 11 of the umbrella windows. Make note of the spacing between the reaction coordinate values in the fourth column of metafile.11 compared to in metafile.all. They are now spaced by ~0.5 . First we will attempt to use this metafile in the same way we did previously.

  • Run the following ndfes commands to generate the chk and path files:
  • [user@cluster] ndfes_omp --mbar -w 0.15 --nboot 0 -c metafile.11.0.15.chk metafile.11
    [user@cluster] ndfes-path_omp --chk metafile.11.0.15.chk --ipath metafile.11 --npathpts 11 --nsplpts 400 --opath path_11_spl_0.15

  • Plot the PMF in xmgrace:
  •  [user@cluster] xmgrace -block path_11_spl_0.15.rbf.0.dat -bxy 3:4

    Figure 4
    Figure 4. PMF as a function of reaction coordinate value from simulation of 11 umbrella windows analyzed with a 0.15. The free energy is interpolated with 400 spline points. bin width.

    The x-axis is the reaction coordinate value and the y-axis is the free energy in kcal/mol. Something has gone horribly wrong and the free energy has shot up to 5000 in 3 places. When we ran the ndfes_omp command to create the chk file, we used -w 0.15, meaning the bin width when solving the MBAR/UWHAM equations was 0.15 . However, our windows are now spaced ~0.5 apart. In our disang file we set the force constants to be 100 kcal/mol 2. Given our harmonic biasing potential, the distribution of observed reaction coordinates should be approximately normally distributed with a standard deviation of σ=(1/2βK)1/2 where β is the one over the Boltzmann constant times temperature and K is the force constant set in the disang file. Setting rk2 and rk3=100 gives an expected standard deviation in the reaction coordinate of ~0.054 . In a normal distribution, about 99.7% of the samples will be within 3 standard deviations of the mean, which here is ~0.16 . If the windows are ~0.5 away from each other and the sampling spans ~0.16 around the mean, it is highly likely that there will be bins between windows that contain zero samples.

    When a bin contains no samples, ndfes reports 5000 for the energy, which effectively indicates it is infinitely high because a calculation could not be made. When we ran ndfes-path_omp we set --nsplpts to 400 in an attempt to create a smooth spline interpolation through the data, but this meant interpolating through empty bins. Now let's try running the command again without interpolating. This will just return the free energy at the 11 path points.

  • Run the following ndfes command:
  • [user@cluster] ndfes-path_omp --chk metafile.11.0.15.chk --ipath metafile.11 --npathpts 11 --opath path_11_0.5 

  • Plot the result in xmgrace:
  • [user@cluster] xmgrace -block path_11_0.5.rbf.0.dat -bxy 3:4

    Figure 5
    Figure 5. PMF as a function of reaction coordinate value from simulation of 11 umbrella windows analyzed with a 0.15 bin width without spline interpolation.

    Now the energy is more reasonable, but the PMF is disjointed. Perhaps is we increase the bin width, there will be no empty bins and we will be able to smoothly interpolate the PMF.

  • Run the ndfes commands again, but increase the bin width from 0.15 to 0.3 :
  • [user@cluster] ndfes_omp --mbar -w 0.3 --nboot 0 -c metafile.11.0.3.chk metafile.11
    [user@cluster] ndfes-path_omp --chk metafile.11.0.3.chk --ipath metafile.11 --npathpts 11 --nsplpts 400 --opath path_11_spl_0.3

  • Plot the result in xmgrace:
  • [user@cluster]  xmgrace -block path_11_spl_0.3.rbf.0.dat -bxy 3:4 

    Figure 6
    Figure 6. PMF as a function of reaction coordinate value from simulation of 11 umbrella windows analyzed with a 0.3 bin width. The free energy is interpolated with 400 spline points.

    Recall that with 32 windows, the PMF looked like this:

    Figure 7
    Figure 7. PMF as a function of reaction coordinate value from simulation of all 32 umbrella windows.

    As you can see, we were able to make an interpolation, but the PMF is very noisy. All of the bins contain samples, but for many it is not enough to reliably interpolate. In addition, increasing the bin size creates a more course gained view of the free energy landscape. In the end, we simply need more sampling, which is achieved by using more umbrella windows.

    2. 2-dimensional strings

    The necessary inputs and subsequent outputs for parts 2.1 through 2.3 are available in:
    /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/2D

    There are 3 sub-directories we will look at: linear, PT, and MT. Each contains input and output. The inputs should be copied to your working directory. If you are unable to produce an output for a given step, copy it from the output directory in order to proceed. There will be times when a calculation has been performed for you and you will do analysis. In these cases, where indicated, you will copy the outputs to your working directory and analyze them.

    To use AMBER programs, you must:
    module use /expanse/projects/qstore/amber_ws/packages/modulefiles/
    module load workshop/software

    You will also periodically need to use xmgrace to plot your results. This module conflicts with AMBER on Expanse, so it is recommended to open another Expanse window (be sure to use ssh -Y to enable the gui) and type:
    module load cpu/0.15.4 intel/19.1.1.217 grace/5.1.25
    to do the plotting tasks and avoid loading and unloading modules.

    Given that the barrier to methyl transfer was so high, we will now include an additional coordinate for proton transfer. This evaluates the potential impact of general acid catalysis on the mechanism and free energy profile. However, with this new degree of freedom, it is unclear what the minimum free energy path is to get from reactants to products, so we need a method to navigate the multidimensional free energy landscape.

    In this section you will analyze data from a series of surface accelerated string method (SASM) simulations that sample the reaction space with two reaction coordinates: RC1 = |RC10:N3-H| - |RmG:N1 - RH| and RC2 = |RO6mG:O6 - RCm| - | RA63:N1 - RCm |. The process of generating the inputs and equilibrating the windows for two dimensions would be analogous to the 1D example, except another reaction coordinate is specified in the img.disang file, and adequate window spacing must be maintained in both dimensions. The set-up for these simulations has been done for you, and you will focus on analysis of the free energy surface.

    The inputs for the sting simulations and resulting outputs are located in the 2D directory.

    The tutorial is structured as follows:

    2.1 Introduction to running SASM simulations

    The surface accelerated string method is an iterative procedure starting from an initial guess of the path that seeks to convergence upon the minimum free energy path. This method converges rapidly by running a set of umbrella sampling simulations on a given path, and after each iteration optimizing the path on the collective free energy surface from all of the sampling iterations performed up to that point. Every other string iteration, it performs "exploratory" simulations where the umbrella centers are placed just beyond the point on the minimum free energy path to thoroughly sample the vicinity of the path to avoid falling into local minima and create a rigorous free energy surface.

    Under the time constraints of this exercise, the simulation data has been provided for you, and you will perform analysis. First, let's take a look at how the data was generated. There are 3 simulation directories provided called PT, MT, and linear. These correspond to 3 initial guesses of the path from which the pathfinding was initiated. PT is a stepwise pathway where the proton is transferred followed by the methyl transfer in a perfectly stepwise fashion. MT follows the opposite pathway where the methyl transfer occurs first and the proton transfer occurs second. The linear pathway is a perfectly concerted mechanism where the path is a straight line from reactants to products. At the offset, it is unclear which pathway is most energetically favorable. The tutorial will use the linear path as the example.

    Figure 8
    Figure 8. Initial guesses of the minimum free energy path from reactant (lower left corner) to product (upper right corner).

  • Chose one of the paths to work with for the exercise and copy the inputs to your working directory.
  • [user@cluster] mkdir linear
    [user@cluster] cd linear  
    [user@cluster] cp -r /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/2D/linear/input/* ./

    In the simulation directory, there is a script called SASM.slurm that was used to generate the data provided to you. The script was set up to run 51 iterations of the string method, each simulation for 4 ps. We will briefly take a look at key lines in the file that are fundamental to using the SASM. First,

    MAKEANA="ndfes-path-analyzesims.py --curit=${cit} -d ${DISANG} ${NEQIT} ${TEMP} --maxeq=0.25 --skipg"  

    Once you have run a set of simulations, this command acts a wrapper for ndfes-CheckEquil.py, ndfes-PrepareAmberData.py, and ndfes-CombineMetafiles.py. It will determine the equilibration region of the simulations up to a fraction of maxeq, and analyze the dumpave files in that region. The resulting metafile will contain data from the current string and previous string iterations, excluding NEQIT iterations. Next, the combined metafile is used to generate the chk file:

    MAKECHK="ndfes_omp --mbar -w 0.15 --nboot 0 -c metafile.all.chk metafile.all"  

    This command remains the same from the 1D example we did previously. Finally, the next string iteration directory is generated using the following command:

    MAKESIMS="ndfes-path_omp --curit=${cit} -d ${DISANG} -i ${MDIN} ${NEQIT} ${TEMP} --sasm --maxit=300 --minsize=10 --akima --npathpts=100 --wavg=4 --wavg-niter=1"  

    This ndfes command optimizes the path on the existing free energy surface in order to determine where the next set of umbrella centers should be placed. The optimization is done by performing up to maxit so called "synthetic iterations" of the string method using npathpts. This optimization essentially decouples the minimum free energy path from the simulated umbrella centers so that the path can be more rigorously represented, and the simulations are performed to improve the quality of sampling. The minsize parameter indicates that a bin should have at least 10 samples to be included in the analysis. The parameter wavg is the order of the cardinal B-spline used to approximate the free energy surface as a weighted average of nearby bin values. The wavg-niter parameter applies an iterative correction to the wavg histogram values such that the interpolated values match the values at the bin centers. Increasing this value will decrease artifacts from averaging, but it may lead to noisier data. These values can be adjusted based on the system and desired smoothness of the free energy surface. Based on the optimized path, the MAKESIMS command will determine the next set of umbrella centers, and it will set up the next run directory based on the template input file and disang file provided.

    2.2 Analyzing a set of strings

    Now that you understand how the data was generated, you will see how the string has progressed and the resulting free energy surface. The output of the string simulations is located in the outputs directory for the respective initial guesses. You will perform the analysis on the outputs of the string, so these should be copied to your working directory.

  • For instance, if you are working with the linear initial guess, copy those outputs to your working directory:
  • [user@cluster] cp -r /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/2D/linear/output ./
    [user@cluster] cd output

    The outputs contain iterations zero (init) through 20, and each iteration contains a directory called analysis. For the init, it002, it005, it010, and it020 directories, the analysis directory has been moved to analysis-out so that you can generate analysis on your own without writing over files, and if you encounter issues, the intended results are still available for you to look at. For the sake of file size, structure files have been removed, but would have been generated by the simulations. For each of these directories you will run the following series of commands. In each case, replace the --curit flag and change the cd command to the directory you are analyzing. When running Example2d.py, change the title to match the iteration, and use the flag --inset-bright instead of --inset-tleft if you are analyzing the MT initial guess of the path.

  • Check the equilibration, and create the new dumpave files and metafile:
  • [user@cluster] ndfes-path-analyzesims.py --curit=0 -d template/img.disang --neqit=4 --temp=298 --maxeq=0.25 --skipg 

  • Go to the analysis directory
  • [user@cluster] cd init/analysis  

  • Create the chk file:
  • [user@cluster] ndfes_omp --mbar -w 0.15 --nboot 0 -c metafile.all.chk metafile.all  

  • Optimize the path on the free energy surface:
  • [user@cluster] ndfes-path_omp --ipath metafile.current --chk metafile.all.chk --neqit=4 --temp 298 --maxit=300 --minsize=10 --akima --npathpts=100 --wavg=4 --wavg-niter=1 --opath path  

  • Create an image of the free energy surface and current estimate of the path:
  • [user@cluster] python ../../Example2d.py --ipath path.wavg.0.dat --title 'init' --inset-tleft --zerobypath0 --wavg=4 --wavg-niter=1 --minene=-25 --maxene=25 --minsize=10 metafile.all.chk  

    The ndfes-path-analyzesims.py and ndfes_omp commands are the same as in the SASM run script. For each window, the ndfes-path-analyzesims.py script will print some information about the equilibration of the window. More information can be found by running:

    ndfes-CheckEquil.py -h
         Ninp:  The number of input samples
         Nout:  The number of output samples
         Teq:   The percentage of samples excluded as equilibration
         i0:    The 1-based index of the first frame to write (the "start" value 
                when using cpptraj)
         s:     The stride through the data (the "offset" value when using cpptraj)  
         g:     The statistical inefficiency of the correlated samples
         Wf:    The mean value of the bias potential from the first half of 
                statistically independent samples after excluding the first 
                i0-1 samples as equilibration
         dWf:   The standard error of Wf
         Wl:    The mean value of the bias potential from the last half of 
                statistically independent samples after excluding the first 
                i0-1 samples as equilibration
         dWl:   The standard error of Wl

    Here we are modifying the ndfes-path_omp command to simply print the minimum free energy path without creating a new simulation directory. Finally, the Example2d.py script will create a figure of the free energy surface with the current estimate of the minimum free energy path and free energy profile. The Example2d.py script as well as a 3d version are available with the ndfes package in the examples directory if you wish to make figures like this for your own projects in the future. If you are unable to obtain the desired output, the files are available in a directory called analysis-out in each of the unanalyzed iteration directories.

  • Transfer the png files to your local machine
  • Take a look at the progression of the string by opening metafile.all.chk.wavg.0.path.png for the init, it002, it005, it010, and it020 directories.
  • For the linear guess, the progression looks something like this:

    Figure 9. Free energy surfaces starting from a linear initial guess with optimized minimum free energy path (black) after iterations 0, 2, 5, 10, and 20. The inset shows the corresponding PMF.

    Now let's compare the path from the different starting guessed. Here is an example of the result from it020 of the strings initiated from the 3 different pathways:

    Linear pathway

    Stepwise (PT) pathway

    Stepwise (MT) pathway

    Figure 10
    Figure 10. Free energy surfaces starting from a linear and two different step-wise initial guesses with optimized minimum free energy path from 20 string iterations (black). The inset shows the corresponding PMF.

    We see that both the linear and stepwise initial guess where proton transfer (RC1) precedes methyl transfer (RC2) both converge to the minimum free energy path, but the stepwise initial guess where methyl transfer precedes proton transfer remains in that local minimum. This highlights that while the SASM is robust, like all string methods one should make educated, and potentially mulitple, initial guesses.

    Additional exercises

    Only complete sections 2.3 and 2.4 if you have completed sections 1.1-1.5, 2.1, and 2.2 and have remaining time in the session.

    2.3 Visualizing the path progression

    Finally, let's take a look at the convergence of the string over all of the iterations of string data provided in the outputs for the pathway you have chosen (linear, PT, or MT). The Example2d.py script has a feature for creating movies of the string and free energy surface progression. Each string directory contains a script called Make2dMovie.sh that runs the Example2d.py script in order to create a series of png files that are combined into a movie using ffmpeg. This exercise was performed on 50 string iterations to confirm convergence of the path. A run of this script has already been performed for you, and the output was written to the fullmovie directory:

    [user@cluster] ls fullmovie
    movie.sims.mp4  sims.00006.png  sims.00012.png  sims.00018.png  sims.00024.png  sims.00030.png  sims.00036.png  sims.00042.png  sims.00048.png
    sims.00001.png  sims.00007.png  sims.00013.png  sims.00019.png  sims.00025.png  sims.00031.png  sims.00037.png  sims.00043.png  sims.00049.png
    sims.00002.png  sims.00008.png  sims.00014.png  sims.00020.png  sims.00026.png  sims.00032.png  sims.00038.png  sims.00044.png  sims.00050.png
    sims.00003.png  sims.00009.png  sims.00015.png  sims.00021.png  sims.00027.png  sims.00033.png  sims.00039.png  sims.00045.png  sims.00051.png
    sims.00004.png  sims.00010.png  sims.00016.png  sims.00022.png  sims.00028.png  sims.00034.png  sims.00040.png  sims.00046.png  sims.00052.png
    sims.00005.png  sims.00011.png  sims.00017.png  sims.00023.png  sims.00029.png  sims.00035.png  sims.00041.png  sims.00047.png  sims.00053.png 

  • Transfer the movie.sims.mp4 to your local computer and open the file in google chrome or a supported media player:
  • [user@local] google-chrome fullmovie/movie.sims.mp4  

    Here is the movie depicting the string progression from the linear initial guess:

    In the movie, you will see small circles along the path that represent the control points defining the path. The x's indicate where the umbrella centers have been placed for simulations. You will notice that there is alternation between simulations (x's) being placed explicitly on the path for refinement and off the path for exploration. This allows for rapid convergence as well as rigorous sampling on both sides of the path. Since we used the neqit=4 flag when running ndfes, string iterations 0-4 are eventually phased out once at least 5 previous iterations of sampling available to analyze. The reasoning is that the initial guess is very poor (the string evolves dramatically), so the short amount of sampling of these early iterations likely do not well-represent an equilibrium ensemble. Feel free to look at the movies generated for the other 2 paths to compare.

    Visually, there is very little evolution of the minimum free energy path past iteration 10 or so. When the simulation data was generated, the MAKESIMS command in SASM.slurm outputs two convergence tests to analysis.CIT.log, where CIT is the current iteration. The first test is called the slope test that checks whether the slope of the reaction coordinates over at least 5 iterations is less than the threshold value. The RMSD test checks whether the RMSD of the overall path over at least 5 iterations in less the the threshold value. These are thresholds that have been established based on various test cases, but one should still visually inspect the path.

  • Check the convergence tests using the following commands:
  • [user@cluster] grep -r "Slopes appear to be converged" *log | head -n5
    analysis.008.log:Slopes appear to be converged
    analysis.009.log:Slopes appear to be converged
    analysis.010.log:Slopes appear to be converged
    analysis.011.log:Slopes appear to be converged
    analysis.012.log:Slopes appear to be converged
    
    [user@cluster] grep -r "Path is effectively CONVERGED" *log
    analysis.010.log:Min slope in RMSD    1.687e-03 Path is effectively CONVERGED
    analysis.011.log:Min slope in RMSD    6.127e-04 Path is effectively CONVERGED
    analysis.012.log:Min slope in RMSD    2.864e-04 Path is effectively CONVERGED
    analysis.013.log:Min slope in RMSD    2.337e-04 Path is effectively CONVERGED
    analysis.014.log:Min slope in RMSD   -9.001e-05 Path is effectively CONVERGED  
    

    Based on this criteria, it appears that both tests were passed by iteration 10 for the linear initial guess. This is consistent with our visual inspection of the movie. Based on the string simulations performed in this tutorial, it is clear that the minimum free energy path for the MTR1 reaction involves a stepwise mechanism where a general acid protonates the bound ligand to initiate methyl transfer to the target adenine. This free energy barrier (~20 kcal/mol) is about 10 kcal/mol lower than the 1 dimensional case that we computed previously for just methyl transfer, and the 2 dimensional path where methyl transfer precedes proton transfer.

    2.4 Comparing the PMF at the semi-empirical and ab initio level

    The input and subsequent outputs for part 2.4 are located in /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/2D/compare_PMF/inputs and /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/2D/compare_PMF/outputs. The inputs should be copied to your working directory in a new folder called compare_PMF.

    [user@cluster] mkdir compare_PMF
    [user@cluster] cd compare_PMF
    [user@cluster] cp -r /expanse/projects/qstore/amber_ws/tutorials/QMMM_DFTB3_MTR1/2D/compare_PMF/inputs/* ./

    In this tutorial you have used an approximate semi-empirical Hamiltonian, so now let's get a sense of how well it performed by comparing the PMF to the ab initio, PBE0 level PMF. This PMF was also generated using the string method. You have been provided a file called DFTB3.dat and PBE0.dat in the compare_PMF directory. DFTB3.dat is path file that we previously obtained when analyzing the DFTB3 strings that converged to the minimum free energy path.

  • Plot the DFTB3 and gwTP corrected PBE0 level PMF's using xmgrace:
  • xmgrace -block DFTB3.dat -bxy 2:5 -block PBE0.dat -bxy 2:5 

  • Align the PMFs such that the free energy is 0 at x=0 for both curves. This can be done in the Evaluate expression window under Data -> Transformations.
  • Figure 11
    Figure 11. PMF as a function of reaction progress at the DFTB3 (black) and PBE0 (red) level of theory. The evaluate expression window demonstrates how to align the zero of energy for direct comparison.

    Not only are the rate determining free energy barriers different, but the mechanistic interpretation is different when the higher level of theory is used. At the DFTB3 level (black) there is a stable intermediate state in which the proton transfer has occurred. This would suggest a proton would reside on the ligand a majority of the time. This is contrary to experimental activity-pH profiles of this type of base pair. At the PBE0 level (red), the intermediate is slightly higher in energy than the reactant state, suggesting the proton would likely be in rapid equilibrium between the cytosine residue and the ligand. This has implications for how one would interpret experimental observables such as pH dependence and highlights the importance of balancing the speed of the simulations with the accuracy of Hamiltonian.

    References