Dynamite: a simple way to gain insight into protein motions
aLaboratory of Molecular Biophysics, Department of Biochemistry, University of Oxford, Oxford OX1 3QU, England
*Correspondence e-mail: email@example.com
A public web-based facility to infer, analyse and graphically represent the likely modes of a protein motion, starting from a static structure, is presented. This facility is based on the use of CONCOORD to generate an ensemble of feasible protein structures that are subsequently analysed by principal component analysis to identify probable concerted motions. The user is returned the ensemble of feasible structures, together with associated analyses, including animations and graphical representations of both the principal component of the ensemble covariance and indicators of strongly correlated pairwise atomic motions. Whilst users are warned that completely reliable inferences about protein motion may be beyond even substantially more rigorous tools for exploring configurational space, it is hoped that the service will allow a much wider community to benefit from the insights that simple dynamic data may offer.
Keywords: Dynamite; protein motion.
A full understanding of the structure–function relationship for a protein requires insight into dynamic properties as well as static structure. The principal function of certain proteins, including transporters and motor proteins, can only be understood on the basis of molecular motions (e.g. the coupling of proton transport to ATP synthesis in the F0F1 ATPase; Abrahams et al., 1994). However, even for classes of protein for which mechanical action is not directly involved in function, molecular motions are still important. For example, the Src kinase regulatory mechanism that couples phosphorylated C-terminal tail binding to the maintenance of an inactive conformation has been suggested to relate to a tight coupling of the motions of SH2 and SH3 domains in the auto-inhibited form that is mitigated upon Src kinase tail dephosphorylation (Young et al., 2001). This hypothesis, suggested by analysis of a molecular-dynamics simulation, was used to design mutagenesis experiments that probed and confirmed a proposed route of intramolecular communication.
Many software tools already exist to allow users to predict and analyse the likely motions of a protein to provide the type of results mentioned above [e.g. GROMACS (Lindahl et al., 2001), CHARMM (Brooks et al., 1983) and CONCOORD (de Groot et al., 1997)]. For the most part, however, they require a high degree of technical competence in computational dynamics and visualization software. Whilst this level of competence could be achieved by most researchers given enough time, we have undertaken to deskill the process in order to allow a much larger population to use dynamics data in their research. We have produced Dynamite, a publicly accessible web-based server to generate and analyse likely modes of protein motion. This server-based approach compliments the Interactive Essential Dynamics program (IED) provided by the McCammon group (Mongan & McCammon, 2003). IED is installed locally by a user and provides a graphical user interface to control the calculation of essential dynamics, using GROMACS to perform a molecular-dynamics simulation and subsequent analysis of the trajectory. As well as offering a different user interface, Dynamite performs a different set of analyses to those performed by IED and supports molecular-graphics routes to interpret the output.
1.1. Design goals for Dynamite
Dynamite has been designed to be simple to use, so that users should be able to access the functionality of the underlying programs without installing them on their local machines. This simplifies the user experience and removes much of the design cost associated with making a program compatible with multiple platforms. Dynamite has been written to be usable by anyone in the field of protein research, even if they have no dynamics experience, and to this end requires only an input coordinates file and offers selection of few parameters beyond specification of the type of analyses and representations that are requested. Ruggedness was another important goal. When analyses of the type provided by Dynamite are conducted manually, it is often found that minor problems have to be resolved during the process. For example, the implementation of the server has to handle difficult cases, which may have non-standard residues, may have atoms missing from the coordinate set or may simply be too large for computer memory. The base level for this ruggedness is that it should be able to trap and recognize errors, so that it can report back to the user/system manager the cause of problems. A secondary goal has been that Dynamite should be able to provide a reasonable response upon encountering common problems. For example, where inference of the essential dynamics for all atoms of a protein would involve analysis of a matrix of a size too large to be accommodated in computer memory, the expert layer of the server will automatically choose to restrict analysis to a subset of atoms, e.g. Cα atoms.
2. Materials and methods
2.1. Overall scripting structure
The program performs the following functions: (i) input of initial protein structure and options, (ii) pre-screening of data for anomalies, (iii) generation of an ensemble of feasible structures from the starting structure, (iv) analyses of the ensemble, (v) generation of movies illustrating the results of the analyses and (vi) the return of results to the user.
To achieve these functions, Dynamite orchestrates a number of existing protein-dynamics packages, including CONCOORD, GROMACS and VMD (Humphrey et al., 1996). Dynamite incorporates an expert/administrative layer that shepherds individual jobs through the correct sequence of routines to generate the requested analyses, deals with any that have stalled or in some other way failed and notifies users of completion. In addition, Dynamite assembles a number of newly written programs that provide specific analyses not available from existing packages. Dynamite has been written in Python, an object-oriented programming language that lends itself well to the control of scripted programs and is able to interact directly with the Python interface of VMD. The overall process has been broken down into three stages: (i) the generation of an ensemble of probable protein configurations, (ii) the application of various analytical tools to identify concerted protein movements and (iii) the illustration of both global concerted motions and specific pairwise correlated atomic displacements.
2.2. Ensemble generation
The probable modes of motion available to a protein can be inferred from an analysis of a comprehensive ensemble of configurations that a protein is able to explore. One way to generate such an ensemble is by molecular dynamics (MD, reviewed recently by Moraitakis et al., 2003). In this case, each frame of the ensemble is related to its predecessor through a time relationship; i.e. the positions and velocities of atoms in frame N + 1 are derived from their values in frame N by application of Newton's laws, with forces drawn from an empirical model of interatomic interactions. Initial coordinates are derived from the parent structure and initial velocities are randomly assigned from a Maxwellian distribution consistent with the simulated temperature. The resulting trajectory effectively represents a movie of the movement of the protein. Whilst this approach has the advantage of being a simulation of a physical situation, it is computationally intensive; it can take many months to generate trajectories corresponding to simulated periods of a few tens of nanoseconds. This length of simulation is, moreover, generally not long enough to explore the whole of the configuration space available to that protein, so that only a high-frequency subset of possible protein motions can be inferred.
An interesting alternative is to use non-Newtonian methods of ensemble generation. CONCOORD takes this approach and we have adopted it as a fast way to generate ensembles that explore configuration space more fully. This approach works in two phases. In the first phase, CONCOORD derives from the parent structure a table of constraints for the various interatomic distances (covalent, ionic etc.) that define a configuration of that protein. The permitted domain for each interatomic distance depends on an empirical model of the significance of different categories of interaction. Thus, if a covalent bond is recognized in the parent structure then a constraint is introduced that will reproduce this bond in all members of the derived ensemble to within a small tolerance. Similarly, a charge–charge electrostatic interaction in the parent structure will be reproduced in all ensemble members, although in this case a slightly greater tolerance is permitted. In the second phase, CONCOORD generates an ensemble of structures (typically 500) that preserve the interatomic distances of the parent structure. For each structure in the ensemble, CONCOORD performs the following. Firstly, the atoms from the reference structure are randomly positioned in a box. Where pairs of atoms do not satisfy their specified distance constraint they are caused to move with respect to each other. This process is repeated until all constraints are satisfied. The ensemble generated in this way has some of the character of a dynamics trajectory, with the temporal order of frames scrambled. The obvious disadvantages of this method are (i) that the ensemble does not describe the physical evolution of the protein with time and (ii) that each frame in the ensemble does not necessarily correspond to a physically accessible configuration. Nevertheless, previous studies have found that ensembles generated by this method can closely resemble those derived from MD, while requiring several orders of magnitude less time to generate. Validation of this kind is presented by de Groot et al. (1997). In addition, we have tested whether the Dynamite-scripted implementation of CONCOORD reproduces the results of an MD simulation by comparing the results of Dynamite and GROMACS analyses of the protein MDM2. This work is presented below.
2.3. Covariance analysis and `correlation webs'
One of the simplest analyses performed addresses the question of which atoms appear to undergo concerted motions. This is achieved by an analysis of the covariance matrix derived from the ensemble of structures. If we list the coordinates of each atom for a frame and iterate two variables (i, j) over those coordinates for each frame t of the ensemble then the covariance is given by
Note that the frames are enumerated by the symbol t because it would be traditional to perform this analysis over an MD trajectory in which each frame is associated with a particular time. The fact that the frames are not temporally related does not affect the analysis and t remains as a convenient label for the frames. This matrix is illustrated in Fig. 1.
This matrix is consolidated to form a per-atom normalized covariance (PANC) matrix by summing the x, y and z covariance components for each atom and normalizing,
The renormalization of the matrix is such that the self-covariance of an atom, is 1. Significant PANC values substantially away from the leading diagonal of the matrix identify parts of the protein that are not close in the primary structure but that do in fact tend to reconfigure in a concerted fashion.
Young et al. (2001) suggest an interesting way to visualize this data. Quite simply, a line is drawn on a three-dimensional representation of the molecule to connect any two atoms i and j such that > threshold. We have arbitrarily chosen a threshold of 0.7. This approach yields an image in which correlated regions are linked by a web, as if their motion were constrained by a network of elastic rods. This information can give a strong feeling for the movement of the molecule. In the reference cited, good use is made of comparing the webs formed in this way from simulations of different functional states of a protein in order to compare the resulting dynamic properties of the protein.
2.4. Principal components analysis and porcupine plots
Principal components analysis (Garcia, 1992) takes a data set and reduces its dimensionality. Applied to the current situation, we derive a matrix Λ by diagonalizing Ci,j with a transformation matrix T, so that Λ = TTCT. The columns of T are then the eigenvectors vi of the motion, with the first column being the most significant motion, and the diagonal elements of Λ are the eigenvalues of the decomposition. A cone drawn from the atomic coordinate of an atom, with height and direction derived from the components of v1 that relate to that atom, gives a graphical representation of the motion held in v1, the first eigenvector. We refer to this representation as a porcupine plot (Tai et al., 2002). Examples are shown in Fig. 2. In these examples, the size of the cones has been scaled to make them visible when the whole molecule is imaged.
Equivalent information can be illustrated by generating a movie of N frames to illustrate the motion represented by an eigenvector. We take frame 1 as the average structure of the molecule within the ensemble. Frame N has the atoms displaced by a vector defined by components of v1. The intermediate frames are produced by simple interpolation between these two extremes. There is a danger to this interpolation, in that the intermediate frames are almost certainly non-physical. We stress that they offer only an impression of the character of a probable preferred mode of protein motion.
2.5. Molecular-dynamics simulation
For the purpose of comparison, we have generated MD trajectories for the protein MDM2 both alone and in complex with a peptide derived from p53. GROMACS was used to simulate several runs up to 10 ns in length based on the GROMOS96 43a force field. In line with common practice, in each case there was a brief energy-minimization phase followed by a dynamics run with atomic positions restrained before the production dynamics run. The trajectories generated in this way were analysed and compared with analyses of the Dynamite/CONCOORD ensemble.
Dynamite has been tested with a number of proteins. We present here the Dynamite analysis of MDM2, an oncoprotein that binds to the p53 transcription factor (Momand et al., 1998). Binding of MDM2 causes p53-directed gene transcription to be inhibited since the p53/MDM2 complex has lower affinity for DNA and is subject to nuclear export. MDM2, moreover, has E3-ubiquitin ligase activity that enables it to target p53 to the 20S proteasome for degradation. The transforming quality of MDM2 derives from its ability, when overexpressed, to hyperdestabilize p53, since p53 plays an essential role at the end point of several checkpoint pathways that control progress through the eukaryotic cell cycle. The mdm2 gene is amplified or overexpressed in several human cancers (Vargas et al., 2003), enabling the cancerous cells to overcome p53-mediated checkpoint surveillance that would otherwise result in cell-cycle arrest or apoptosis. Taken together, these facts suggest that the interaction between p53 and MDM2 is a suitable target for structure-based inhibitor design as a potential route to anticancer therapy (Lane, 1999). To date, structures have been reported for the p53-interaction domain of MDM2 in complex with a p53-derived peptide (Kussie et al., 1996) and with a family of small-molecule inhibitors (Vassilev et al., 2004). Although we have been able to reproduce the former crystals, we have not been able to crystallize the apo enzyme so as to enable high-throughput ligand-binding studies by crystal soaking. For this reason, we wished to test whether the absence of a ligand from the p53-binding domain might be responsible for introducing additional flexibility into the p53-binding domain, explaining the apparent difficulty in crystallization.
The peptide-bound structure of MDM2 (PDB code 1ycr ) was used to produce parent structures either that retained the peptide structure (termed MDM2/p53) or from which the peptide had been deleted (termed MDM2/apo). These structures were submitted to the Dynamite server and the resulting CONCOORD-generated ensemble and the graphical representations of the corresponding analyses are given below.
The ensemble of CONCOORD-derived structures was returned as a text file. The main reason to return this text file is to allow the user to perform their own analyses on the data and to verify that Dynamite has returned a sensible result. For the purposes of this paper we have manually converted this data into a graphical image, shown in Fig. 3.
3.2. Movie along the first eigenvector
Fig. 4 shows two frames from the first eigenvector movie of MDM2/apo. When watching the movie it is clear that there is an opening and closing motion of the site where p53 had been bound. The user can elect to receive both the MPEG movie and a text file that lists the coordinates of each atom for each frame of the movie. This latter option allows the comparison of several movies if the user has some basic visualization skills. For example, Fig. 5 shows frames from a movie that was prepared by hand from the two PDB-format trajectories superimposed and aligned frame by frame. This figure emphasizes the increased amplitude of closing motion of the p53-binding site of MDM2 when p53 is removed. Clearly, if the user wished to quantify the magnitude of this motion they could do so by measuring relevant interatomic distances at the start and end points of the motion.
Biologically, the conclusion from these movies is that when the peptide is removed from the MDM2/p53 the structure shows an increased capacity to undergo breathing motions that narrow the peptide-binding site. This result in turn suggests that there may be a large degree of induced fit of MDM2 to p53.
3.3. Porcupine plots
Fig. 2(a) shows a porcupine plot for eigenvector 1 of MDM2/apo. Examination of the `quills' for the regions that border the peptide-binding cleft indicate a closing motion, necessarily consistent with the impression offered by the movie representation. A similar plot for MDM2/p53 (not shown from this viewpoint) does not show such a clear rearrangement, consistent with a reduced amplitude for the breathing motion in the peptide-bound complex. Fig. 2(b) shows an alternative viewpoint of MDM2/p53, generated by a VMD script from data returned by the Dynamite server. This figure illustrates a pair of `vortices', one at each end of the protein, an appearance that is characteristic of a motion whereby two approximately rigid lobes flex about a connecting hinge.
3.4. Covariance web
Fig. 6 shows a covariance web for MDM2/p53. As would be expected, a mesh of lines interconnects the atoms within secondary-structural elements, consistent with locally correlated motions of the atoms within these elements, which therefore behave as approximately rigid bodies. In addition to such observations that are consistent with chemical intuition, less readily predicted higher order correlations also appear to occur. For example, movement of the peptide appears to correlate most closely with movements of the bottom-left region of MDM2, as seen from the orientation of Fig. 6. This effect is more apparent in a rotating view of the covariance web than from any single oriented view and correspondingly it is a movie of a rotating picture of the web that is returned by Dynamite. We are exploring the use of VRML as an alternative means of returning this analysis while allowing the user to select a preferred viewing orientation, but also intend to improve the interface with molecular-graphics packages such as VMD and CCP4MG (Potterton et al., 2002).
Interestingly, the apparently `tighter' interaction of p53 with the `lower' surface of MDM2 suggests that this region might be the more appropriate part of MDM2 to target in the design of inhibitors of the p53–MDM2 interaction.
Real validation of the Dynamite protocol would require comparison of Dynamite results with an objectively true picture of protein motions. Although no experimental technique provides such a picture, techniques such as NMR offer some insights into the magnitude and character of motions that occur in a protein in solution. Interestingly, a comparison of the NMR spectra of the p53-binding domain of MDM2 alone and in complex with a p53-derived peptide is consistent with the Dynamite prediction that the MDM2/apo structure might undergo substantial changes in structure and dynamics upon binding to the p53-peptide (Schon et al., 2002).
An alternative approach to validating Dynamite is to compare its results with those derived from a method of exploring configurational space that is more physically rigorous than CONCOORD, namely MD simulation. A full comparison of predicted modes of motion from two different techniques of generating ensembles is complicated and highlights the significant problems associated with focusing upon a single eigenvector as we have with Dynamite (van Aalten et al., 1997). Although in the following discussion several favourable comparisons were discovered between the predicted modes of motion as inferred from MD and CONCOORD ensembles, attention is drawn in the output of the server to the fact that such a simple treatment can potentially be misleading.
Fig. 7 presents the reconfiguration of MDM2 that is predicted by MD to occur upon ligand release. This structural change is consistent with the preferred mode of motion of MDM2/apo that is predicted by CONCOORD (Fig. 2a).
For both MDM2/p53 and MDM2/apo, breathing motions are observed around the peptide-binding cleft. Of the two states, MDM2/apo demonstrates a substantially greater tendency to reconfigure in this way. Whilst the porcupine plots (Fig. 8) of the first eigenvector of the MD ensemble appear more complex than those from the Dynamite analysis (Fig. 2), they are not inconsistent with the conclusions drawn from them.
The similarity between the motions predicted by MD and CONCOORD is further supported by a comparison of covariance webs of ensembles generated by the two methods. Fig. 9 shows a comparison of covariance webs of MDM2/apo as evaluated from MD-based and CONCOORD-based ensembles. The webs from different ensemble-generating techniques are qualitatively similar (yellow lines predominate). The covariance threshold applied to the CONCOORD analysis is 0.7, while that applied to the MD analysis is 0.39, reflecting the fact that fewer strongly correlated motions are found in the MD simulation.
Qualitatively, it appears that the longer the MD simulation is performed, the more closely the results resemble those deduced from CONCOORD analysis. By comparison of Figs. 10(a), 10(b) and 10(c), it can be seen that the covariance web inferred from a 10 ns simulation resembles the CONCOORD covariance web more closely than the web inferred from a 5 ns simulation does. It is tempting to extrapolate from this that CONCOORD is achieving its ambition of more efficiently sampling configurational space while retaining physical plausibility.
Dynamite is a free web-based service to provide data and analyses of probable protein motions. It needs only an input PDB file to work and functions without the user needing experience of molecular dynamics. In this paper, MDM2 has been used to illustrate the kinds of results that can be provided.
It is important to understand that these analyses are not rigorous for a number of reasons discussed in the text above, not least that only the first eigenvector of motion is returned. We hope, however, that Dynamite will broaden workers' appreciation of the impact dynamics have on the function of their subject proteins, suggesting further analyses and experiments.
The results for MDM2 presented above suggest that MDM2 binds to p53 by induced fit and that removal of p53 peptide from the binding cleft on MDM2 leads to a partial collapse of the cleft. Additionally, we find that the peptide moves in concert with one half of MDM2 more than the other. This result suggests that targeting the surface of MDM2 in this area may be particularly effective in disrupting the binding of p53.
Principal components analysis visualized with `porcupine' plots indicates that a large part of the bulk motion of MDM2 can be described by a bilobal flexing of the molecule.
We have provided some validation of the non-Newtonian procedure employed in generating the ensembles used for the analysis. We observe that the key features of the dynamics predicted by Dynamite (based on CONCOORD) are also seen in a molecular-dynamics analysis run on GROMACS.
The extent to which a physically meaningful ensemble of structures can be generated depends on both the algorithm used to generate candidate members and the force field employed to screen them. In order to explore alternative approaches to both of these aspects, we intend to implement an internal coordinate mechanism (ICM) module as an alternative to CONCOORD for this phase of protein-motion prediction. As well as providing carefully calibrated potentials for use in screening candidate ensemble members, ICM works with a lower-dimensional description of protein structure to effectively remove the need for high-frequency sampling of a Newtonian trajectory (Abagyan et al., 1994).
Further development will include the extension of the expert layer to recognize and cater for the analysis of non-trivial input coordinate sets (e.g. multiple conformations, tightly bound metal ions, nucleic acids and non-proteinaceous ligands). We also intend to return data through means that will simply allow the user to control representation and orientation. At least one way will be through refining the interface between Dynamite output and laboratory visualization tools such as VMD and CCP4MG.
Dynamite took only a few hours to produce the ensemble, analysis and movies presented in this paper. We hope that its ease of use and relative short turnaround time will enlarge the population of protein scientists that can use dynamics data to guide their investigations. Dynamite is accessible through the URL https://dynamite.biop.ox.ac.uk/dynamite .
We thank the authors of CONCOORD, especially Bert de Groot, the authors of GROMACS, especially Berk Hess, the curator of DSSP, Gert Vriend, and the authors of VMD for permission to implement their software in the Dynamite server. We also thank the group of Mark Sansom, especially Oliver Beckstein and Kaihsu Tai, for helpful discussions about simulation techniques. In the cell-cycle group we thank Jane Endicott, Nicole Schüler, Jan Gruber and Jean Francois Trempe for experimental studies of MDM2, which will form important correlates of the computational approach. CPB is supported by the BBSRC through a grant to the Oxford Centre for Molecular Sciences, which has also funded the computational facility that will host Dynamite calculations.
Aalten, D. M. van, de Groot, B. L., Findlay, J. B. C., Berendsen, H. J. & Amadei, A. (1997). J. Comput. Chem. 18, 169–181. Google Scholar
Abagyan, R., Totrov, M. & Kuznetsov, D. (1994). J. Comput. Chem. 15, 488–506. CrossRef CAS Web of Science Google Scholar
Abrahams, J. P., Leslie, A. G., Lutter, R. & Walker, J. E. (1994). Nature (London), 370, 621–628. CrossRef CAS PubMed Web of Science Google Scholar
Brooks, B. R., Bruccoleri, R. E., Olafson, B. D., States, D. J., Swaminathan, S. & Karplus, M. (1983). J. Comput. Chem. 4, 187–217. Web of Science CrossRef CAS Google Scholar
Garcia, A. E. (1992). Phys. Rev. Lett. 68, 2696–2699. CrossRef PubMed CAS Web of Science Google Scholar
Groot, B. L. de, van Aalten, D. M., Scheek, R. M., Amadei, A., Vriend, G. & Berendsen, H. J. (1997). Proteins, 29, 240–251. CrossRef PubMed Google Scholar
Humphrey, W., Dalke, A. & Schulten, K. (1996). J. Mol. Graph. 14, 33–38. Web of Science CrossRef CAS PubMed Google Scholar
Kussie, P. H., Gorina, S., Marechal, V., Elenbaas, B., Moreau, J., Levine, A. J. & Pavletich, N. P. (1996). Science, 274, 948–953. CrossRef CAS PubMed Web of Science Google Scholar
Lane, D. P. (1999). Br. J. Cancer, 80 (Suppl. 1), 1–5. Google Scholar
Lindahl, E., Hess, B & van der Spoel, D. (2001). J. Mol. Model. 7, 306–317. CAS Google Scholar
Momand, J., Jung, D., Wilczynski, S. & Niland, J. (1998). Nucleic Acids Res. 26, 3453–3459. Web of Science CrossRef CAS PubMed Google Scholar
Mongan, J. & McCammon, J. (2003). Interactive Essential Dynamics, v. 1.8.2. University of San Diego, CA, USA. Google Scholar
Moraitakis, G., Purkiss, A. G. & Goodfellow, J. M. (2003). Rep. Prog. Phys. 66, 383–406. Web of Science CrossRef CAS Google Scholar
Potterton, E., McNicholas, S., Krissinel, E., Cowtan, K. & Noble, M. (2002). Acta Cryst. D58, 1955–1957. Web of Science CrossRef CAS IUCr Journals Google Scholar
Schon, O., Friedler, A., Bycroft, M., Freund, S. M. & Fersht, A. R. (2002). J. Mol. Biol. 323, 491–501. Web of Science CrossRef PubMed CAS Google Scholar
Tai, K., Shen, T., Henchman, R. H., Bourne, Y., Marchot P., McCammon, A. (2002). J. Am. Chem. Soc. 124, 6153–6161. Google Scholar
Vargas, D. A., Takahashi, S. & Ronai, Z. (2003). Adv. Cancer Res. 89, 1–34. Web of Science CrossRef PubMed CAS Google Scholar
Vassilev, L.T., Vu, B. T., Graves, B., Carvajal, D., Podlaski, F., Filipovic, Z., Kong, N., Kammlott, U., Lukacs, C., Klein, C., Fotouhi, N., Liu, E. A. (2004). Science 303, 844–848. Web of Science CrossRef PubMed CAS Google Scholar
Young, M. A., Gonfloni, S., Superti-Furga, G., Roux, B. & Kuriyan, J. (2001). Cell, 105, 115–126. Web of Science CrossRef PubMed CAS Google Scholar
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.