Conventions and workflows for using Situs

Recent developments of the Situs software suite for multi-scale modeling are reviewed. Typical workflows and conventions encountered during processing of biophysical data from electron microscopy, tomography or small-angle X-ray scattering are described.


Situs Format (current)
Source: Author The Situs map format was once necessary to keep track of the 3D coordinate system within the Situs package (see main article), to enforce a cubic lattice, and to be independent of the format variations in the community (6.). The advantages of the format are: • It is unambiguous and requires only a minimalist header: In the map format, a short header holds the voxel spacing WIDTH, the map origin as defined by the 3D coordinates of the first voxel ORIGX, ORIGY, ORIGZ, and the map dimensions (number of increments) NX, NY, NZ. The header is followed by data fields such that X increments change fastest and Z increments change slowest.
• It is ASCII-based and thus easily readable/editable with a text editor.
• It is supported by the molecular graphics programs VMD, Chimera, and Sculptor, by the EMAN2 reconstruction package, and also by the em2em format conversion tool.
Disadvantages of the Situs format (compared to typical binary formats) are: • The file size is about 3 times larger than that of comparable binary formats, requiring compression of archived maps.
• Very large maps take noticeable time to read from (or write to) hard disk.
• A separate format conversion step was needed in the Situs work flow (map2map tool).
To take advantage of binary data storage, CCP4 and MRC file formats were recently adopted for direct use within Situs programs. In recent years, CCP4-derived formats (such as MRC) added noncrystallographic capabilities to keep track of the 3D coordinate system, enabling their use within Situs. The original CCP4 format was also adopted as the official format of the Electron Microscopy Data Bank (EMDB

Other MRC Variants
A. Minor variations of the obsolete, pre-2000 MRC format, are known: • the start values of the map were not set correctly to properly reflect the origin • the cell axes were given in voxel units, not in Angstrom • confusingly, some pre-2000 MRC maps also had the CCP4 style 'MAP' string set Such maps are rarely encountered today.
B. Some software packages use their own variation of the above formats. For example, the IMOD package used in tomography supports various non-standard data modes and additional header fields relevant to tomography. The cell axes may be given in nm units, not in the traditional Angstrom units. Also, IMOD files currently follow the "unsigned" 8-bit byte convention (see MODE 0 in 3.), but due to compatibility problems with the EMDB (which uses CCP4), the format may soon adopt signed 8bit bytes. For more details on the MRC format variants used by IMOD see http://bio3d.colorado.edu/imod/doc/mrc_format.txt.
Although the conventions described in the following section were designed specifically for MRC 2000 and CCP4 formats, they are tolerant of variations such as IMOD and pre-2000 MRC. It should be noted that non-standard header values (e.g. due to different unit conventions) can be edited manually with the map2map program, if necessary.

Format Conventions
The developers of em2em (Michael Schatz), Chimera (Tom Goddard), VMD (John Stone), Sculptor (Stefan Birmanns), and Situs (Willy Wriggers) discussed various implementation details of the CCP4 and MRC 2000 formats in the spring of 2009 to coordinate a consistent map representation across different software. Incremental changes and the lack of utility of some CCP4 features to the EM community have caused the pre-2000 MRC format to become incompatible with CCP4 (see 3.), but the more recent MRC 2000 format (4.) reintroduced some CCP4 header fields and conventions (in addition to new ones) so that it might be possible to merge the two formats in practical applications. There is good reason to do this: Whether a file is CCP4 or MRC 2000 may get lost in vague file suffixes (.dat, .map, etc.). Also, sometimes X-ray maps are viewed or written by EM tools, and vice-versa.
The developers discussed four remaining differences between the formats and thought of ways to handle them gracefully: the origin conventions, the axis permutation capability, the support for non-cubic lattices, and the support of signed or unsigned 8-bit bytes. The following conventions were adopted by Situs on automatic read and write of CCP4 and MRC 2000 maps.

A. Origin Conventions
CCP4 maps are in register with the coordinate system origin, and only N*START indices are set, whereas the MRC 2000 *ORIGIN (fields 50-52) are typically set to zero. In contrast, MRC 2000 uses *ORIGIN for identifying the position of the first voxel, and the map is not necessarily in register with the origin of the coordinate system. It depends mainly on the user community whether axis permutations are supported via the MAP[C,R,S] fields or not. The crystallographic community (CCP4) definitely supports it, but the EM community (MRC) tends to ignore these fields and assumes MAP[C,R,S]=1,2,3. Situs generally supports axis permutations. Due to the possible mix of software there is good reason to support CCP4 style axis mapping even to files believed to be in MRC format.
Note, that when permuting X,Y,Z axes (order of data records) via MAP[C,R,S], one must permute also the fields N[C,R,S] and N[C,R,S]START in the header. The CCP4 header clearly separates the permutable C,R,S fields from the fixed X,Y,Z fields. In MRC 2000 one must then permute the equivalent N[X,Y,Z] and N[X,Y,Z]START fields, even though these use X,Y,Z designation.

C. Non-Cubic Lattices
There are additional complexities in the case of skewed unit cells and non-cubic lattices. As with axis permutation, skewed unit cells are generally supported, even in MRC 2000 maps. Skew parameters and voxel spacing parameters are computed from the cell angles and dimensions (fields 8-16), ignoring the CCP4 fields 25-37.

D. Signed or Unsigned 8-Bit Bytes
Input maps in MODE 0 use 8-bit bytes for space-saving data storage. The CCP4 standard supports signed values in the range of -128 to 127. However, the MRC programs internally convert negative values (-128 to -1) to positive values (127 to 255) by adding 256. This "unsigned" convention was once universally adopted for MRC maps prior to 2000 (see 3.) but many programs in the EM community now interpret signed 8-bit bytes at face value (see 4.). Therefore, Situs determines the convention on automatic read of MODE 0 maps, based on a simple discriminant analysis of the density values. Output maps are automatically written in MODE 2, except if MODE 0 is explicitly requested during manual use of the map2map program.
Note that Situs automatically recognizes the endianism of MODE 1 or MODE 2 maps using the numeric values of the header fields. If a MACHST machine stamp (field 54) is present in imported files, it is ignored. For compatibility reasons, the correct CCP4 / MRC 2000 machine stamp is set in exported maps.
The above Situs conventions are (mostly) supported by Sculptor, VMD, Chimera, and em2em. The area where em2em currently does not comply is in assuming cubic lattices and in not allowing axis permutation, owing to the EM focus of the software. VMD does not presently support MODE formats other than 2. Chimera distinguishes more strictly than Situs between certain CCP4/MRC flavors. Situs is unique in detecting the 8-bit signed-ness convention. For more information see the respective user guides.

Future Updates
This document summarizes the map format conventions of Situs at the time of this writing (version 2.7). As the MRC-related formats continue to evolve, it is possible that future versions of the software require an update of this document, which will be posted online at http://situs.biomachina.org/fmap.pdf.