The Uppsala Electron-Density Server
The Uppsala Electron Density Server (EDS; http://eds.bmc.uu.se/ ) is a web-based facility that provides access to electron-density maps and statistics concerning the fit of crystal structures and their maps. Maps are available for ∼87% of the crystallographic Protein Data Bank (PDB) entries for which structure factors have been deposited and for which straightforward map calculations succeed in reproducing the published R value to within five percentage points. Here, an account is provided of the methods that are used to generate the information contained in the server. Some of the problems that are encountered in the map-generation process as well as some spin-offs of the project are also discussed.
For experts and non-experts alike, macromolecular electron-density maps are the best representation of the crystallographic experiments that underpin the atomic models that are published and deposited. This is because models are just one crystallographer's subjective interpretation of the data and the maps (Brändén & Jones, 1990), reflecting that particular crystallographer's skills, experience and prejudices and possibly mistakes. Density maps, on the other hand, may reveal features that have not been interpreted as well as features for which an alternative interpretation may exist. Further, availability of an electron-density map enables users of a structure to assess the validity of claims made in the paper (active-site make-up, presence and conformation of bound ligands, nature of interactions etc.), and to carry out a proper assessment of the quality of the model (validation). For all these reasons, deposition of both model coordinates and experimental structure factors is mandatory according to the IUCr guidelines, which have been adopted by most journals that publish macromolecular crystal structures. However, even given the availability of model and data, only scientists with some knowledge of crystallography and with access to appropriate software are able to calculate electron-density maps. Therefore, we have undertaken to calculate such maps (in a uniform fashion) for all crystal structures in the Protein Data Bank (PDB) for which structure factors are available and to make the resulting maps (and statistics concerning the fit of model and data) available to the entire community of structure consumers through a server, the Uppsala Electron Density Server (EDS).
In this paper, we first review the history of the debate in the macromolecular crystallographic community concerning model and data deposition and briefly describe the current state of affairs. We then provide a detailed account of the methods that are used in the calculation of the EDS maps and the problems encountered in this process. We further describe the kinds of files and information that are made available through the EDS server. Finally, a brief overview of the current state of the server is provided and possible future developments as well as some spin-offs of the project are discussed.
Deposition of crystallographic data (coordinates and structure factors) has been hotly discussed since the late 1980s when the IUCr formulated a policy requiring deposition of such data. The first wave of discussions concerned the issue of whether or not deposition of coordinates should be made mandatory in conjunction with publication (Barinaga, 1989; Maddox, 1989; Koetzle, 1989). Eventually, most journals accepted the IUCr guidelines of the time (with Nature dragging its feet until 1996; Editorial, 1996). The IUCr guidelines allowed for a one-year delay on the release of coordinates and a four-year delay on the release of structure factors. In 1996, several groups of authors urged the crystallographic community to deposit structure factors for all their structures (Baker et al., 1996; Jones et al., 1996). Two years later, another round of discussions revolved around the issue of the allowed release delays (Wlodawer et al., 1998; Editorial, 1998a, 1998b), with a number of journals eventually deciding to require immediate coordinate release upon publication (Campbell, 1998; Bloom, 1998). The IUCr, too, changed its guidelines after internal discussions (Baker & Saenger, 1999) and currently recommends deposition of coordinates and structure factors in the PDB, with release of coordinates upon publication, and of structure factors no more than six months after publication (Commission on Biological Macromolecules, 2000).
The mandatory deposition of structure factors is the next important issue that needs to be addressed by the community and the journals (but not necessarily the last issue: perhaps we should consider deposition of unmerged intensities or even raw diffraction images in the future). Fortunately, the community nowadays supports the notion of structure-factor deposition as judged by the record-high fraction in the year 2003 of structures deposited with the PDB for which structure factors were deposited as well (Kleywegt et al., 2004). In 1995, only a third of all deposited crystal structures were accompanied by structure factors and in the period 1997–2002 this fraction hovered around two-thirds. However, in the year 2003 suddenly four out of every five crystal structure depositions included structure factors: a remarkable improvement and hopefully the beginning of a drive towards close to 100% structure-factor deposition. There are nevertheless considerable differences between different journals, with Acta Crystallographica, EMBO Journal and Protein Science reaching impressive structure-factor deposition levels of 90% or more, whereas Nature, Science and (disappointingly) Biochemistry are the only three journals for which fewer than two-thirds of the structures were accompanied by structure factors in the year 2003 (Kleywegt et al., 2004).
The arguments against deposition (in particular of coordinates) and, later, in favour of extended release delays have been relatively few and most of them either do not apply any longer or can be addressed by postponing publication (at the risk of being scooped by competitors). With the sophistication of present-day refinement and model-building programs, as well as the speed of modern computers, the argument that time is needed to improve the accuracy of models has lost most of its validity. Delayed release of coordinates or data in order to file patent applications, to exploit the structure for ligand-design purposes or to reap more scientific rewards (follow-up studies e.g. of mutants and complexes) can be accomplished by delaying publication. The fear that others (bioinformaticians, theoreticians, competitors) might exploit a structure quicker than the scientist who determined it should encourage that scientist to broaden his expertise or to seek collaborations with experts in related areas. In some cases, low-resolution structures cannot be represented reliably by an all-atom model; in such (exceptional) cases, the IUCr guidelines provide for the deposition of a `Cα-only' model (but accompanied by structure factors all the same). Unfortunately, this exception has been interpreted rather liberally at times. In a 1997 study of `Cα-only' models (Kleywegt, 1997), fully 70% of all such models in the PDB at the time had been determined at better than 3.0 Å resolution (with 12% at 2.0 Å or better).
With respect to the deposition of structure factors specifically, some people have argued that they are superfluous since the refined temperature factors already provide an indication of the reliability of individual atoms. However, the arguments against this are many and sound. Firstly, temperature factors are not experimental data but model parameters that in addition are difficult to compare between different structures. Secondly, temperature factors are notorious for their role as `error sinks'; they tend to account for much more than simply thermal vibration (e.g. unresolved disorder, partial occupancy, dynamic disorder, refinement artefacts such as inappropriate constraints or restraints, as well as possible errors in atom types, conformations etc.). This makes it essentially impossible to determine which factor(s) cause high temperature factors. Finally, temperature factors will never reveal any features in the density that have not been modelled or that could have been interpreted differently.
The arguments in favour of deposition (and immediate release) of coordinates and structure factors have been many. They can be clustered into a number of categories.
The process of calculating the electron-density maps for EDS involves downloading the coordinate and structure-factor files from the PDB (Berman et al., 2000), conversion of the CIF-format reflection files to CCP4 (MTZ) format, modification of the coordinate files to make them suitable for processing with REFMAC (Murshudov et al., 1997), calculation of structure factors and map coefficients with REFMAC, calculation of σA-weighted (Read, 1990) and Fcalc maps with CCP4 (Collaborative Computational Project, Number 4, 1994) programs, calculation of real-space R values and other residue-based statistics with MAPMAN (Kleywegt & Jones, 1996a) and the generation of files that can be downloaded by EDS users. Every now and then, this process is carried out from scratch for all entries. For this we use a Linux-based computer cluster with seven nodes, which allows the calculations to be performed in ∼3 d. In addition, the server is updated automatically every weekend, when new and updated coordinate and structure-factor files are downloaded from the PDB and processed. The update process is carried out by a C-shell script, whereas the map calculations for individual entries are performed by a Perl script that carries out the following steps.
Despite our best efforts, there are still a large number of PDB entries for which coordinates and structure factors are available, but for which we are unable to calculate structure factors such that the reported R value is reproduced (to within five percentage points). These failures may be caused by problems with the coordinate files, problems with the structure-factor files, or limitations of the software that we use. Below, we describe the three categories of problems that we encounter and some of the most common causes of these problems.
The electron-density server can be accessed through the URL http://eds.bmc.uu.se/ .
An entry can be accessed directly by providing its PDB code. Alternatively, a rudimentary keyword search can be carried out. In addition, some database centres provide links to EDS, e.g. the RCSB PDB site (Berman et al., 2000) and the IMB Jena Image Library of Biological Macromolecules (Reichert et al., 2000) and the search facilities available at these centres can therefore also be used to locate a certain EDS entry. Information on how to link to specific EDS entries is provided on the EDS website. For each EDS entry we provide the following information (Fig. 1).
As of the end of November, 2003, EDS comprised 23 267 PDB entries, of which 19 864 were crystal structures. For 10 751 of these (54%), structure factors were available. For 104 (1%) of these entries we were unable to calculate maps, whereas for 9394 (87%) of them the R value calculated by us agrees within five percentage points with the one reported in the PDB entry. These numbers imply first and foremost that for almost half of all deposited crystal structures no experimental data has been deposited. Unless a major effort is made now by the responsible scientists, we have to fear that this data will be lost to science forever. To help crystallographers identify for which of their entries (if any) structure factors have yet to be deposited, we provide an easy-to-use web-based form (http://eds.bmc.uu.se/eds/eds_sos.html ). The second conclusion is that a for a sizeable number of entries (more than 1200 at present) straightforward calculations using the deposited coordinates and structure factors are not sufficient to reproduce the published R values to within a reasonable margin (five percentage points). Although there are certainly cases where our software is simply not sufficiently advanced, in many of the cases where we or the original depositors have been able to pinpoint the source of the problem it has involved errors (often of a book-keeping nature) that were introduced at the deposition stage. To track down the source of the problems in the remaining cases, help from the depositors is invaluable. Authors who find that their entries are not represented in EDS may, as a first step, want to download their own files from the PDB and attempt to reproduce their published R values. If these attempts fail, it should not be too difficult for them to track down the problem by looking for discrepancies between the files that were actually used during structure refinement and those that were downloaded from the PDB. Many of the problems are trivial and easily correctible by the authors (but usually not by anyone else!) and may be due to typographical errors, swapping of indices or cell constants, or mixing up of related files. That these problems have not been detected previously is because the EDS project is probably the first in which a systematic effort is made to calculate electron-density maps for all of the more than 10 000 crystal structures for which structure factors are available. As a community we need to make an effort to fix problems in the existing database entries and we need to do it sooner rather than later, while the original files still exist (on media that can still be read with modern equipment) and while the people who did the work are still around. For the future, problems can be prevented only by making map calculations an integral part of the data-deposition process. To this end, we have been working with the MSD group to make EDS-style calculations part of the deposition process at their site. Nowadays, when a crystallographer deposits a model and structure factors at the EBI site, the EDS calculations are carried out automatically, summary statistics presented and the resulting files are made available to the depositor.
Thanks to the fact that EDS now contains more than 9000 electron-density maps, all calculated in a consistent fashion, we have a large set of statistics pertaining to maps waiting to be `mined'. For example, we investigated what factors the unit-cell r.m.s. density level (`σ-level') depends on. Our initial assumption was that it would be correlated with the solvent content of the parent crystal, but when this statistic was calculated for a set of entries, the correlation turned out to be poor. We should not have been surprised to find that the strongest correlation was in fact with the occupancy-weighted average temperature factor (averaged over all non-water entities so as to yield a single value for every entry). The inverse correlation (Fig. 4) suggests that it could be advantageous to use (dynamically and automatically) variable contouring levels during model inspection and rebuilding, where the locally averaged occupancy-weighted temperature factor determines the appropriate contouring level.
In addition to the overall statistics, EDS also provides detailed statistics concerning the real-space fit of all residues in more than 9000 crystal structures allowing comparative and retrospective studies, e.g. concerning the fit of ligands, the reliability of water molecules etc. To date, however, we have only used these statistics to identify some cases of poorly fitting molecules to use as educational examples in lectures.
The residue-specific real-space fit statistics (such as those in Table 1) are an additional valuable by-product of the EDS project. They will make it possible to use residue- and resolution-specific RSR-cutoff values in validation procedures (e.g. OOPS2; Kleywegt & Jones, 1996b). More importantly, they can be employed in automatic rebuilding programs and protocols such as ARP/wARP (Perrakis et al., 1999), for instance by using heuristic rules such as `remove or rebuild all residues for which Z(RSR)> 2'. A prototype program, ELAL (`ELectronic ALwyn'), that applies such heuristics has been written (GJK, unpublished results) and will be integrated into a future version of ARP/wARP (Cohen et al., 2004).
The availability of a large number of maps also makes it possible to study the phenomenon of register errors. This type of model-building error occurs when one or more residues are skipped or inserted into the model to render the sequence and the model out of sync (Kleywegt et al., 1996; Jones & Kjeldgaard, 1997). At present it is impossible to determine how common such errors are in deposited models since at least two models of the same molecule are needed to detect any register shifts and the density is needed for both to determine if either of them is in error. More importantly, there are no (combinations of) diagnostics that are known to be specifically suited to detecting such errors in models prior to their deposition (especially in the absence of comparison models). We have therefore undertaken a study of sets of crystal structures of the same molecule between which register shifts (not necessarily errors) occur and will try to correlate these to coordinate-based and map-based validation statistics (H. Hansson & GJK, unpublished results].
Finally, the work on EDS has probably made a small contribution to improving the quantity, quality and integrity of the structural database as a whole. We have identified (and sometimes resolved) a number of problems with structure-factor files that had not come to light previously. Further, quite a few crystallographers have used the web form that we provide to identify which of their models did not have structure factors deposited (in some cases leading to the deposition of several dozen old data sets by a single crystallographer). Others have examined their deposited models and structure-factor files and identified and corrected mistakes that were made during the original deposition process.
As for future developments, we are constantly working on improving the methods that are used to calculate the maps, and on trying to identify causes of failed map calculations. With the help of the crystallographic community we hope to improve the 87% success rate. At a later stage, we will also attempt to incorporate maps that are provided by the crystallographers themselves (experimental maps, NCS-averaged maps etc.). We are also working on a simple server with which crystallographers will be able to execute the EDS-style calculations prior to deposition. Finally, in the long term the community will probably have to face the issue of whether the structural database should be static or dynamic. As methodology improves, it seems likely that re-refinement of older models (either on a case-by-case basis, or as one large-scale project) might provide better models and, hopefully, increase our understanding of the chemistry and biology of the molecules under study.
At present, structure factors are available for only 54% of all crystal structures deposited in the PDB. Unless the community makes a serious effort now, we must assume that the remaining data is lost to science forever. Fortunately, structure-factor deposition has become vastly more common in recent years, with 78% of all crystal structures deposited in the year 2003 being accompanied by the corresponding data set.
Electron-density maps are the best representation of the crystallographic experiment (both for experts and non-experts). However, their computation requires crystallographic expertise and access to the proper software. Therefore, we have calculated maps for more than 9000 macromolecular crystal structures in the PDB and make these available through an internet-based server, the Uppsala Electron Density Server (EDS). In doing so, a `dead' collection of structure-factor files has been transformed into a publicly accessible collection of thousands of maps that can be inspected, scrutinized and admired by experts and non-experts alike.
We thank Johan Hattne (Uppsala University) for his help in implementing the interactive map viewing in EDS, Mike Hartshorn (Astex Technology) and Tom Oldfield (EBI) for developing and modifying the AstexViewer, Jawahar Swaminathan (EBI) for implementing the EDS software at EBI and for many corrections to legacy structure-factor files in the PDB archive, and Kim Henrick (EBI) for useful discussions and the EBI–EDS collaboration. The AstexViewer software is used in EDS by permission and includes code developed by Astex Technology Limited, UK. In its initial stages, EDS was supported in part by EU contract NO. CT96-0189. The Swedish Natural Science Research Council (NFR, now VR) and the Wallenberg Foundation (through the Linnaeus Centre for Bioinformatics in Uppsala) have supported later developments. Current support is provided by EU contract No. QLRT-2001-00015. GJK is a Royal Swedish Academy of Sciences (KVA) Research Fellow, supported through a grant from the Knut and Alice Wallenberg Foundation. He is supported by KVA, the Swedish Structural Biology Network (SBNet), Uppsala University and its Linnaeus Centre for Bioinformatics.
Baker, E. N., Blundell, T. L., Vijayan, M., Dodson, E., Dodson, G., Gilliland, G. L. & Sussman, J. L. (1996). Nature (London), 379, 202–202. CrossRef CAS PubMed Web of Science
Baker, E. N. & Saenger, W. (1999). Acta Cryst. D55, 2–3. Web of Science CrossRef CAS IUCr Journals
Barinaga, M. (1989). Science, 245, 1179–1181. CrossRef CAS PubMed Web of Science
Berman, H., Henrick, K. & Nakamura, H. (2003). Nature Struct. Biol. 10, 980. Web of Science CrossRef PubMed
Berman, H. M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T. N., Weissig, H., Shindyalov, I. N. & Bourne, P. E. (2000). Nucleic Acids Res. 28, 235–242. Web of Science CrossRef PubMed CAS
Bloom, F. E. (1998). Science, 281, 175. CrossRef PubMed
Boutselakis, H. et al. (2003). Nucleic Acids Res. 31, 458–462. Web of Science CrossRef PubMed CAS
Brändén, C.-I. & Jones, T. A. (1990). Nature (London), 343, 687–689.
Brünger, A. T., Adams, P. D., Clore, G. M., DeLano, W. L., Gros, P., Grosse-Kunstleve, R. W., Jiang, J.-S., Kuszewski, J., Nilges, M., Pannu, N. S., Read, R. J., Rice, L. M., Simonson, T. & Warren, G. L. (1998). Acta Cryst. D54, 905–921. Web of Science CrossRef IUCr Journals
Campbell, P. (1998). Nature (London), 394, 105. PubMed
Cohen, S. X., Morris, R. J., Fernandez, F. J., Ben Jelloul, M., Kakaris, M., Parthasarathy, V., Lamzin, V. S., Kleywegt, G. J. & Perrakis, A. (2004). Acta Cryst. D60, 2222–2229. Web of Science CrossRef CAS IUCr Journals
Collaborative Computational Project, Number 4 (1994). Acta Cryst. D50, 760–763. CrossRef IUCr Journals
Commission on Biological Macromolecules (2000). Acta Cryst. D56, 2. CrossRef IUCr Journals
Editorial (1996). Nature (London), 379, 191.
Editorial (1998a). Nature (London), 391, 617.
Editorial (1998b). Nature Struct. Biol. 5, 407–408.
Guex, N. & Peitsch, M. C. (1997). Electrophoresis, 18, 2714–2723. Web of Science CrossRef CAS PubMed
Hartshorn, M. J. (2002). J. Comput. Aided Mol. Design. 16, 871–881. Web of Science CrossRef CAS
Hooft, R. W. W., Vriend, G., Sander, C. & Abola, E. E. (1996). Nature (London), 381, 272. CrossRef PubMed Web of Science
Jiang, J., Abola, E. & Sussman, J. L. (1999). Acta Cryst. D55, 4. Web of Science CrossRef IUCr Journals
Jones, T. A. & Kjeldgaard, M. (1997). Methods Enzymol. 277, 173–208. CrossRef PubMed CAS Web of Science
Jones, T. A., Kleywegt, G. J. & Brünger, A. T. (1996). Nature (London), 381, 18–19. CrossRef PubMed Web of Science
Jones, T. A., Zou, J. Y., Cowan, S. W. & Kjeldgaard, M. (1991). Acta Cryst. A47, 110–119. CrossRef CAS Web of Science IUCr Journals
Kleywegt, G. J. (1997). J. Mol. Biol. 273, 371–376. CrossRef CAS PubMed Web of Science
Kleywegt, G. J. (2000). Acta Cryst. D56, 249–265. Web of Science CrossRef CAS IUCr Journals
Kleywegt, G. J., Bergfors, T., Senn, H., Le Motte, P., Gsell, B., Shudo, K. & Jones, T. A. (1994). Structure, 2, 1241–1258. CrossRef CAS PubMed Web of Science
Kleywegt, G. J., Harris, M. R. & Jones, T. A. (2004). To be submitted.
Kleywegt, G. J., Hoier, H. & Jones, T. A. (1996). Acta Cryst. D52, 858–863. CrossRef CAS Web of Science IUCr Journals
Kleywegt, G. J. & Jones, T. A. (1996a). Acta Cryst. D52, 826–828. CrossRef CAS Web of Science IUCr Journals
Kleywegt, G. J. & Jones, T. A. (1996b). Acta Cryst. D52, 829–832. CrossRef CAS Web of Science IUCr Journals
Kleywegt, G. J. & Jones, T. A. (1996c). Structure, 4, 1395–1400. CrossRef CAS PubMed Web of Science
Koetzle, T. F. (1989). Nature (London), 342, 114. CrossRef PubMed
Laskowski, R. A., Hutchinson, E. G., Michie, A. D., Wallace, A. C., Jones, M. L. & Thornton, J. M. (1997). Trends Biochem. Sci. 22, 488–490. Web of Science CrossRef CAS PubMed
Laskowski, R. A., MacArthur, M. W., Moss, D. S. & Thornton, J. M. (1993). J. Appl. Cryst. 26, 283–291. CrossRef CAS Web of Science IUCr Journals
Maddox, J. (1989). Nature (London), 341, 277. CrossRef PubMed Web of Science
Murshudov, G. N., Vagin, A. A. & Dodson, E. J. (1997). Acta Cryst. D53, 240–255. CrossRef CAS Web of Science IUCr Journals
Perrakis, A., Morris, R. & Lamzin, V. S. (1999). Nature Struct. Biol. 6, 458–463. Web of Science CrossRef PubMed CAS
Read, R. J. (1990). Acta Cryst. A46, 900–912. CrossRef CAS Web of Science IUCr Journals
Reichert, J., Jabs, A., Slickers, P. & Suhnel, J. (2000). Nucleic Acids Res. 28, 246–249. Web of Science CrossRef PubMed CAS
Vriend, G. (1990). J. Mol. Graph. 8, 52–56. CrossRef CAS PubMed Web of Science
Wlodawer, A., Davies, D., Petsko, G., Rossmann, M., Olson, A. & Sussman, J. L. (1998). Science, 279, 306–307. CrossRef CAS PubMed
© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.