data for structural and crystallization communications

Welcome. This page gives a list of recommended items for inclusion in structural communications in Acta Crystallographica Section F.

The primary purpose of this page is to solicit the opinions and suggestions of members of the Acta D and F Editorial Boards on what information should be required of authors in articles describing macromolecular structures. The page is modelled on the list of requested items published by the journal. An expanded page with details of the individual mmCIF data items is also available and board members are urged to consult it frequently. This latter page often provides a deeper, richer description of what is requested than is possible on this page. It is also liberally decorated with crystallographically realistic examples that are (1) rendered in full-blown mmCIF and (2) also shown in the proposed tabular form that the Editorial Office will convert the mmCIF into upon receipt. We urge you to look at these pages and welcome your comments and suggestions on them in addition to those related to data required for publication. A caution: the two lists, the one on this page and the one with details of the individual mmCIF data items, while beginning with a one-to-one correspondence in data descriptors and in data order, have despite our best efforts, drifted away from that correspondence a bit here and there. It is our intention to repair that drift soon, but we would rather not further delay soliciting your opinions and suggestions.

In the listings below, to initiate the discussion, we have rendered some items in a red typeface. We offer these as principal candidates for data that should be considered mandatory; that is, these data or their equivalents should be presented in the article either (1) within the standard experimental tables that will be auto-generated from the author's deposited mmCIF (in most cases this is the preferred location) or (2) within the main text of the article.

Note that standards we have discussed and adopted, two so far, are accessible via [info] buttons placed at relevant locations. An enzyme-naming convention is also included. Please use this exercise to suggest areas where additional consensus standards may be needed. One we've considered is a standard for unbiased indication of resolution and I hope we can achieve consensus on this concern quickly. I also note the popularity of giving average values of atomic displacement parameters, overall, main chain and side chain, in lieu of r.m.s.d.s from target values.

Once we have a reasonable consensus of the required data, it is our understanding that a version of the RCSB deposition GUI will be made available that will prompt depositors who identify themselves as would-be authors for the required data at time of deposition. Happily, much of the data we propose be designated as required can be taken automatically from the depositors log and output files if they employ the PDB_EXTRACT tool RCSB provides. The mmCIF file created at deposition can then be provided to the Editorial Office at submission time for conversion to tabular form. Our list of requirements will present a significant challenge for authors who want to prepare a fully compliant mmCIF file for transfer to the journal. Authors are not required to provide all required data in mmCIF format; they may supply missing data in text or additional tables at submission time if they choose. Success of collecting this level of detail at time of deposition will depend heavily on (1) cooperation of software developers to output the desired information in a dictionary compliant manner, and (2) authors taking advantage of tools like PDB_EXTRACT to automatically harvest this information from their program output files.

For now, it is envisaged that, as an aid to editors, the review document produced by the online submission system will indicate which of the required data are absent from the mmCIF. Where they are absent from the mmCIF (and thus from the standard experimental table), reviewers should ensure that they are discussed within the appropriate experimental section of the article text.

As an aside, if you explore the other pages, you will see that there are a number of mmCIF items ("*.details" items) that accept text instead of specific data. Some of these appear unavoidable, but we are open to suggestions for improvement. The use of text items reaches its nadir in Section 1.2 on protein preparation. RCSB has prepared an expansion of the mmCIF for preparation that appears complete and is a rich trove of items we could use, but we are advised that at present the relevant community is largely unwilling to deposit using these items. RCSB will not be able to do any translation of information in "*.details" items into specific data items, and at the journal these data will simply be tabulated or, if of significant length, placed in supplementary material. It is our fondest hope that in the near future, as automation and LIMS advance further into everyday use, we will be able to replace the current Section 1.2 with one that accepts specific data in specific mmCIF items. We realize, too, that we must be ever vigilant for changing practices that require changes in our use of mmCIF and in our required data, for change they will, of that we can be certain.

With regard to imposition of data requirements on submitting authors, it is understood that editors are to exercise common sense and appropriate judgement in cases where required data should in fact be omitted (for example, because they do not make sense in the context of a particular design of experiment).



1. Sample information

1.1. Macromolecule and source information

Molecular definition
     Structure name [info]
     Component molecules
     Additional molecular identifiers
     Biological functional unit (BFU) or
         macromolecular assembly, numbers and types of chains
     Mass of BFU (Da)
Macromolecule sequence and chemical configuration [info]
     Sequence database reference code
     Polymers (one-letter code sequence) or Polymer sequence as list of residues
     Ligand, cofactor, ions, solvent
     Mutations
     Post-translational modifications
     Formula weight of entity (Da)
Source organism
     Scientific name
     Strain
     Details
Source gene
     Scientific name
     Strain
     Details

1.2 Macromolecule production [info]

For each macromolecular entity
     PCR protocol (required if recombinant, otherwise absent)
     Cloning protocol (required if recombinant, otherwise absent)
     Expression protocol (required if recombinant, otherwise absent)
     Purification protocol
     Additional details

1.3. Crystallization [info]

Crystallization specifics
     Crystallization method
     Temperature (K)
     Additional details
     Apparatus
     Atmosphere
     Pressure (kPa)
     Crystal growth time
     Seeding
     Volumes and pHs of crystallization solutions
     Compositions of crystallization solutions
Cryo treatments
     Final cryoprotection solution
     Soaking
     Cooling
     Annealing

1.4. Crystal data

     Space group, crystal system
     Unit-cell parameters (Å, °) (s.u. optional)
     Crystal dimensions or radius (mm)
     Colour of crystal
     Crystal habit or shape
     No. of molecules in unit cell (Z)
     Matthews coefficient VM3 Da-1)
     Solvent content (%)

2. Data collection and structure solution statistics

2.1. Data collection, refinement data set

     Data set identifier
     Crystal sample conditions
     Diffraction protocol
     Sampling protocol
     Source of diffracting beam
     Focusing and collimation
     Monochromator
     X-ray beam size
     Wavelength (Å)
     Detector type
     Temperature (K)
     Total measuring time (s)
     No. of images
     Data-processing software
     Resolution range (Å) (overall and outer shell)
     No. of unique reflections (overall and outer shell)
     No. of observed reflections (only for unrefined structures, e.g. crystallization experiments)
     Criterion for observed reflections (only for unrefined structures, e.g. crystallization experiments)
     Completeness (%) (overall and outer shell)
     Redundancy (overall and outer shell)
     < I/σ(I) > (overall and outer shell)
     Rmerge (overall and outer shell)
     Rr.i.m.
     Rp.i.m.
     d-spacing (Å) at which < I/σ(I) > = 2 (if this does not occur, leave blank)
     dopt [info]

2.2 Phasing

     Phasing method

2.2.1. MAD/SAD data and structure solution statistics

For the MAD/SAD application as a whole:
     MAD/SAD phasing method used
     Insertion of MAD/SAD scatterers
     Method of locating scatterers
     No. of MAD/SAD sets used in phasing
     Phasing resolution range (Å)
     Phasing power all data; centric, acentric
     Figure of merit overall
     MAD/SAD solution software
For each phasing set
     Radiation source
     Radiation wavelength
     Temperature (K)
     Resolution range in the phasing data set (Å)
     f' used in phasing
     f'' used in phasing
     Phasing power by set; centric, acentric
And again for the MAD/SAD application as a whole:
     No. of sites
     For each of the sites, the following: site no., atom symbol, occupancy, x, y, z and Biso

2.2.2. MIR/MIRAS/SIR/SIRAS data and structure solution statistics

For the MIR application as a whole:
     No. of derivatives
     Description of the phasing strategy
     Resolution range of phasing (Å)
     Phasing power all data; acentric, centric
     Figure of merit all data
     MIR solution software
For each phasing data set (if the native data set used for phasing is not the set used for refinement, it should be described as the first phasing set; additional data sets will correspond to each of the derivatives):
     Radiation source
     Radiation wavelength
     Temperature (K)
     Resolution range of phasing data set (Å)
Then, for each derivative:
     Derivative
     Derivative preparation
     Heavy-atom location method
     Number of sites
     Figure of merit
     For each of the sites, the following: site no., atom symbol, occupancy, x, y, z and Biso

2.2.3. Molecular replacement data and structure solution statistics

Search model
     PDB code for search model
     or Identification of search model
If phasing data set is not the data set used for refinement:
     Radiation source
     Radiation wavelength (Å)
     Temperature (K)
     Resolution range (Å)
Molecular replacement phasing details
     Alterations to the search model
     MR solution software

3. Model generation and refinement

    
     Structure refinement software
     Refinement on |F|, I, or F2
     σ cutoff in data
     Resolution range (Å) (overall and outer shell)
     No. of reflections used in refinement (overall and outer shell)
     No. of reflections above σ cutoff in final cycle (overall and outer shell)
     Final overall R factor (overall and outer shell)
     Atomic displacement model (iso, aniso, mixed)
     Overall average B factor for the macromolecule(s) excluding solvent (Å2)
     No. of macromolecule atoms refined
     No. of ligand atoms
     No. of solvent atoms
     Total No. of atoms
     No. of refined parameters
     Non-crystallographic symmetry restraints
     Bulk solvent model

4. Model validation

     Final Rwork (overall and outer shell)
     No. of reflections in test set for Rfree (overall and outer shell)
     Final Rfree (overall and outer shell)
     Cruickshank DPI
     R.m.s. deviations from target values for bond distances, bond angles and
      isotropic B factors (overall, main chain and side chain)
     Ramachandran plot analysis
         most favoured regions (%)
         additionally allowed regions (%)
         generously allowed regions (%)
         disallowed regions (%)
     Omitted residues


Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds