¹ Laboratory for Biomolecular Simulation Research, Institute for Quantitative Biomedicine and Department of Chemistry and Chemical Biology, Rutgers University, Piscataway, NJ 08854, USA
Learning Objectives
Prepare a receptor-ligand network for AFE calculations
Calculate network-wide free energy using edgembar
Introduction
This tutorial will demonstrate how to calculate the relative binding free energies for a set of four small molecules bound to the TYK2 protein using the DDBoost package [
Ganguly 2022]. Relative binding free energies (RBFEs) are utilized by drug discovery companies to rank proposed ligands and design selectivity to circumvent off-target effects [
Cournia 2021]. RBFEs leverage the closed thermodynamic cycle, where \begin{equation}\label{E:eqn1}\Delta\Delta\mathit{G}_{\mathrm{Bind}}\end{equation} between ligand 1 and ligand 2 can be found by transforming ligand 1 to ligand 2 through an alchemical pathway in each of the receptor-ligand complex and aqueous phases (Fig. 1) [
York 2023]. The green arrows represent the abslute binding free energy (ABFE, \(\Delta\mathit{G}_{\mathrm{Bind}}\)). Though measurable, these quanties can be difficult to calculate in practice due to the need to sample the confomational changes in the receptor assosiated with annihilating the ligand. The red arrows represent the alchemical tranformation from on ligand to another in the same environment, which are more practical on the basis that less conformational sampling is needed to capture the effect of a small perturbation.
Figure 1. The thermodynamic cycle of a relative binding free energy calculation
For RBFE, pmemd employees a dual topology format, where each molecule has individual coordinates. The atoms found in both ligands forms the 'common core' set of atom. These atoms are constrained to the same position as it's pair in the second topology. Meanwhile, the atoms to be perturbed throughout the alchemical pathway form the 'soft-core' region. The soft-core region is modelled using a softcore potential during the alchemical transformation, where each endstate has its own Hamiltonian. These potentials involve separation shifted scaling and non-linear mixing of the endpoint potentials [
Tsai 2023]
In this tutorial, we employ the smoothstep functions, particularly of order 2, to scale interactions. \[S_{2}(x)=6x^5-15x^4+10x^3\] The smoothstep functions notably have vanishing end point derivative properties: \[\left( \frac{d^kS_P(x)}{dx^k} \right) _{x=0} = \left(\frac{d^kS_P(x)}{dx^k}\right) _{x=1} = 0\] Additionally, the smoothstep functions obey the symmetry condition :\[S_P(1-x)=1-S_P(x)\]
The smoothstep function is used to weight the contributions of each endstate to the potential energy: \[V\left(\lambda\right)=\left(1-S_p\left(\lambda\right)\right)V_0+\left(1-S_p\left(1-\lambda\right)\right)V_1\]
When performing RBFE calculatoins for a set of different ligands, it is possible to use transformations between ligands to create a thermodynamic graph with a set of closed cycles. In principle the net free energy of these cycles should be 0; however in practice, this is not the case. Let us consider a set of four ligands for the TYK2 system: ligands 31 and 42 from the European journal of medicine article [
Liang 2013 1] (ejm_31, ejm_42) and ligands 27 and 28 from the journal of medicinal chemistry article [
Liang 2013 2] (jmc_27, jmc28). In this context we can form a graph with 4 nodes, each corresponding to a ligand, and 6 edges corresponding to the transitions from one ligand to another (Fig 2.)
Figure 2. The thermodynamic graph of TYK2 ligands. The softcore region is portrayed in red.
Using the workflow introduced in the ASFE tutorial, let us prepare the network and analyze data from the simulations to determine the extent to which the cycles fail to close.
Prepare for Simulations
First, create a directory called RBFETutorial. Inside this directory, create a directory called initial. Copy the mol2, frcmod, and lib files to the inital directory. Rename each file to follow the naming convention of ligand_name_0 followed by the file extension. Next, copy the pdb contaning coordinate of the ligand in complex with the protein to the initial directory, but do not change then name. Return to the RBFETutorial and copy the input file used in the ASFE tutorial. Before using the FE-Workflow tool in the DDBoost package to prepare our system for the necessary simulations, implement the following changes to the input file :
The full input file can also be found at /expanse/projects/qstore/amber_ws/tutorials/AFE_tyk2/. Now run the following command:
setup_fe
The workflow will now spend some time preparing the system and generating all of the input files needed for analysis in the following steps:
The workflow reads and parses the input file. The variables defined here are used in all other steps. The ticalc in the input file is set to rbfe, as such the workflow will prepare everything for a RBFE calculation.
The workflow generates an initial setup. A directory is created using the name of the system varialbe. Within that directory, a second directory called setup is made. The mol2, frcmod, lib, and the pdb files in the directory defined by the path_to_input variable, will be copied to this direcotry.
The workflow will now perform atom mapping to determine the common atoms between the two ligands involved in an edge. The mapmethod variable can affect the way the atoms are mapped onto each other. In mapmethod=2, the type of atom is considered, as well as the bond order, during the mapping process. mapnetwork=true is used so that the common-core regions will be reduced such that all edges in the network will have the same substructure used as the common-core. This can be useful in that now the thermodynamic cycles for a network in a single phase, complex or aqueous, should also close.
The workflow will perform all of the steps needed to prepare the files needed to run simulations in both the complex and aqueous phases. Durring this process, the workflow will loop over all edges in the translist variable. Because each edge will involve the transition from a real-state ligand at λ = 0.0 to a different real-state ligand at λ = 1.0, the twostate variable must be set to true. The workflow will now map the coordinates for the first molecule in the edge onto the second molecule and vice versa so that each end-state will have real coordinates for the real ligand and mapped coordinates for the dummy ligand. The worflow will then build system parameters using the forcefields specified by the pff and lff variables, solvate both the ligands and the ligand-protein complexes, add sodium and chloride ions to the concentration defined by the ionconc variable (in addition to counter ions for charged systems). If the hmr variable is set to true, parmed is used to perform hydrogen mass repartitioning on the hydrogen atoms on the solute and protein. Once all edges have been solvated in both the aqueous and complex phases. Each edge is re-solvated using a larger box and waters removed until each system has a uniform number of water molecule across all edges in each phase.
The workflow generates folders for each edge containing the parameters, initial coordinates for both the λ = 0.0 and λ = 1.0 states, input files, and slurm submission scripts. Variables with-in the input file, such as gti_add_sc are used to set variables used by AMBER.
Launch Alchemical Simulations
Each edge in the translist variable will now have a named directory in the /path/to/RBFETutorial/Tyk2/unified/run directory. Each edge directory will contain a directory named "com", for the ligand-protein complex phase, and "aq", for the aqueous phase. Each of these directories will contain its own submission script. In the aqueous phase, the process is largely similar to the ASFE calculation aqueous phase. The key differnece is that there are now 2 endpoints. Each endpoint is equilibrated on its own, and then intermediate points are grown-in from the nearest end-point. The complex phase follows a simular equilibration process; however, the protein is also restrained during the initial stages. The complex phase will follow the following general steps :
This will launch a series of simulations :
The workflow will now spend some time preparing the system and generating all of the input files needed for analysis in the following steps:
Beginning with the real end-states, (λ=0,1), the system undergoes energy minimization with and then with-out solute restraints.
Next, the real end-state system is heated while restraining the both the ligands and the protein under constant volume condtions.
The real end-state system is then equilibrated first with a restrained solute and protein under constant pressure conditions
The force constant of the restraint on the solute and protein is prgressively lowered and and the system is equilibrated under constant pressure conditions.
The restraints on the solute are then removed and the system is reheated under constant volume conditions and equilibrated under constant pressure conditions.
The alchemical states are now "grown-in" by performing energy minimization on the nearest real end-state structure with the λ-dependent Hamiltonian and the equilibration process is repeated for all λ windows.
Hamiltonian replica exchange is performed between λ windows. Replica exchange attempts will occur based on the number of MD steps previously defined by the nstlimti variable in the input file. The numexchgti variable will determine the number of exchange attempts made.
Download Data
SDSC free energy workshop users can download the html file for a set of production simulations, where 4 independent trials were run on each of the aqueous and complex phases for 6 ns, of which, the last 5 ns were used for analysis,
here.
Analyze Simulations
Let us look at the resulting edgembar output (Fig. 1) :
Figure 1. An Example of Graph.html
We can observe the thermodynamic graph and some plots. If we wish to look at the free energy for each "edge", let us migrate to the Edges tab (Fig. 2) :
Figure 2. A Summary of the Alchemical Free Energy for each Transition
Here, we can see a variety of different free energy data. The columns are as follows:
"Edge" refers to the alchemical transition between nodes (molecules).
"Expt" is the experimental RBFE based on the binding energies provided in Expt.dat
"CFE-Expt" is the difference between the experimental free energy and the constrained free energy to be discussed later in this list
"UFE" referse to the unconstrained free energy, that is, the RBFE for the edge as determined using MBAR
"dUFE" is the standard error of the unconstrained free energy.
"CFE" referse to the constrained free energy. Here, the RBFEs for the entire network is calculated under the constraint that the free energy for a closed thermodynamic cycle is zero [
Giese 2021]
"dCFE" is the standard error of the constrained free energy.
"TI", "TI3n", and "TI3c" are the RBFEs as calculated using thermodynamic integration with a linear spline function, a natural cubic spline function, and a clamped cubic spline function respectively.
"Shift" is the difference in the UFE and CFE values
"OFC2" is twice the force constant used during network-wide free energy analysis to enforce cycle closure.
"AveCC" is the average cycle closure error for cycles that include the edge
"MaxCC" is the maximum cycle closure error for cycles that include the edge
"ErrMsgs" is the number of possible errors detected by edgembar when analyzing an edge. This information is inlcuded in the html page for each individual edge. These can be accessed by clicking on the hyperlink to the edge in the Graph.html page
"Outliers" is the number of independent trials for which the free energy for either the aqueous or complex phase is a statistical outlier.
To get more information on the cycle closure errors of the unconstrained RBFEs, we can migrate to the Cycles tab (Fig. 3) :
Figure 3. A Summary of the Cycle Closure Errors for Each Thermodynamic Cycle
Here, we can observe the cycle closure errors for each thermodynamic cycle. As previously mentioned, the net free energy for a thermodynamic cycle does not sum to zero in practice.
Edgembar is able to constrain the cycle closure to zero (CFEs) by analyzing all transitions simultaneous. During this process, the a global objective function weights and sums of the MBAR objective function objective function applying a restraint to penelize non-zero cycle closure conditions. This topic is covered in greater detail in [
Giese 2021]
The constrained free energy is ultimately used to rank the molecules. This can be done by opening the Nodes tab (Fig. 4) and sorting the CFEs relative to the reference node by clicking on the column header :
Figure 4. A Summary of the Binding Free Energies Relative to the Reference Node
References
AMBER Free Energy Tools: A New Framework for the Design of Optimized Alchemical Transformation Pathways H.-C. Tsai, T.-S. Lee, A. Ganguly, T. J. Giese, M. C. C. J. C. Ebert, P. Labute, K. M. Mertz Jr., D. M. York J. Chem. Theory Comput. (2023), 19, 640-658
10.1021/acs.jctc.2c00725
Lead identification of novel and selective TYK2 inhibitors J. Liang, V. Tsui, A. Van Abbema, L. Bao, K. Barrett, M. Beresis, L. Berezhkovskiy, W. S. Blair, C. Chang, J. Driscoll, C. Eigenbrot, N. Ghilardi, P. Gibbons, J. Halladay, A. Johnson, P. Bir Kohli, Y. Lai, M. Liimatta, P. Mantik, K. Menghrajani, J. Murray, A. Sambrone, Y. Xiao, S. Shia, Y. Shin, J. Smith, S. Sohn, M. Stanley, M. Ultsch, B. Zhang, L. C. Wu, S. Magnuson Eur. J. Med. Chem. (2013) 67, 175-187 (
10.1016/j.ejmech.2013.03.070)
Lead Optimization of a 4-Aminopyridine Benzamide Scaffold To Identify Potent, Selective, and Orally Bioavailable TYK2 Inhibitors J. Liang, A. Van Abbema, M. Balaz, K. Barrett, L. Berezhkovsky, W. Blair, C. Chang, D. Delarosa, J. DeVoss, J. Driscoll, C. Eigenbrot, N. Ghilardi, P. Gibbons, J. Halladay, A. Johnson, P. Bir Kohli, Y. Lai, Y. Liu, J. Lyssikatos, P. Mantik, K. Menghrajani, J. Murray, I. Peng, A. Sambrone, W. Yang, J. Young, B. Zhang, B. Zhu, S. Magnuson J. Med. Chem. (2013) 56, 4521-4536 (
10.1021/jm400266t)