CIF applications\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767

CIF Applications. X. Automatic Construction of CIF Input Functions: CifSieve

aNational Institute for Inorganic Materials, Namiki 1-1, Tsukuba, Ibaraki 305, Japan
*Correspondence e-mail: jrh@nirim.go.jp

(Received 10 February 1998; accepted 30 June 1998)

A software package for reading a list of CIF data into user-specified variable names in the DDL domain dictionary is described. The customized function generated by this process provides detailed error reporting and may be called from C or Fortran programs. The package is small, simple to install and fast. It runs on variants of the Unix operating system that have the following utilities available: Bison/Yacc, Flex, Perl and C.

1. Introduction

Since the Crystallographic Information File (CIF) was originally proposed (Hall et al., 1991[Hall, S. R., Allen, F. H. & Brown, I. D. (1991). Acta Cryst. A47, 655-685.]) for exchange and storage of crystallographic data, a range of software tools have been developed for manipulating this format (Hall & Bernstein, 1996[Hall, S. R. & Bernstein, H. J. (1996). J. Appl. Cryst. 29, 598-603.]; Westbrook et al., 1997[Westbrook, J. D., Hsieh, S. & Fitzgerald, P. M. D. (1997). J. Appl. Cryst. 30, 79-83.]). These tools provide functions for checking, reading and writing CIFs, and reduce the complexity of handling the flexible CIF and dictionary definition language (DDL) syntax for programmers wanting to create a CIF-literate application.

Even so, when using such tools, a programmer is still required to specify the CIF name and destination data structure for every required item. This operation is both time consuming and prone to transcription errors, particularly when a large amount of data is involved. A simpler access route to CIF data is needed for small in-house applications in order to encourage the wider crystallographic community to use CIF in place of less portable fixed formats.

It is possible to reduce substantially the programming time required to write an input CIF interface by using the DDL domain dictionary as a simple template for the customized software. This leads to the rapid creation of new CIF-conversant software which can be easily added to existing programs.

In the following description, ‚domain dictionary` refers to a CIF dictionary in DDL format.

2. Applying CifSieve

The program CifSieve may be used to generate a customized CIF reader for addition to existing software by the following two basic steps.

(i) Edit an existing domain dictionary file by inserting a new attribute and value _variable_name variable-name in the definition blocks of those items the application program needs to read from the CIF. See Fig. 1[link] for an example of how this is done. The definitions of items that do not need to be read should be left unchanged. Note that the addition of _variable_name does not alter other uses of this dictionary. variable-name is the name of the variable declared in the application program for storing the read data item. If the items to be read are part of an array (i.e. they exist in the CIF as a looped list) then variable-name should be entered as a dimensional variable, e.g. FCAL(2000).

[Figure 1]
Figure 1
A section of an edited DDL file.

(ii) For C applications the program BuildSiv is used to create an object file for the function cifsiv_ and a separate header file cifvars.h. These are invoked in an application program as cifsiv_(CIFfilename, blockname) to read and store items (as tagged in the domain dictionary) from the CIF named CIFfilename and the data block named blockname. The header file cifvars.h is included in subroutines which manipulate the data input from the CIF. Examples of a modified dictionary file are shown in Fig. 1[link]. This caused BuildSiv to generate a cifsiv_ object file which may be invoked as shown in Fig. 2[link], and the header file cifvars.h as shown in Fig. 3[link].

[Figure 2]
Figure 2
An example C application program.
[Figure 3]
Figure 3
The generated C variable declarations file cifvars.h.

For Fortran applications the program BuildSiv is used to create an object file containing the function cifsiv_ and an include file forcif.inc specifying the common block containing the input variable names. cifsiv_ is applied in a Fortran program as call cifsiv(CIFfilename, blockname, blockbeg) with an additional third argument, which is the address of the beginning of this common block (this is described further below).

3. Design

The CifSieve package comes in two parts: the DDL parser program for the domain dictionary and the main BuildSiv program responsible for constructing the cifsiv_ function source from the DDL parser output. CifSieve relies heavily on freely available software, particularly from the Free Software Foundation (Stallman, 1993[Stallman, R. (1993). The GNU Manifesto. Free Software Foundation, 59 Temple Place Suite 330, Boston MA 02111, USA, http://www.fsf.org.]).

3.1. DDL parser

Separate DDL parsers are provided for the original DDL specification (hereafter DDL1) (Hall & Cook, 1995[Hall, S. R. & Cook, A. P. F. (1995). J. Chem. Inform. Comput. Sci. 35, 819-825.]) and for DDL2 (Westbrook & Hall, 1995[Westbrook, J. & Hall, S. R. (1995). A Dictionary Description Language for Macromolecular Structure, Draft DDL V2.1.0. IUCr-COMCIFS, Chester, England.]). The parsers are automatically constructed from a restricted STAR grammar specification in Bison format (Donnelly & Stallman, 1995[Donnelly, C. & Stallman, R. (1995). The Bison Manual. Free Software Foundation, 59 Temple Place Suite 330, Boston MA 02111, USA, http://www.fsf.org.]), using a Flex-generated lexical analyser (Paxson, 1995[Paxson, V. (1995). The Flex Manual. Free Software Foundation, 59 Temple Place Suite 330, Boston MA 02111, USA, http://www.fsf.org.]). The parser scans the dictionary and if the _variable_name attribute occurs within a dictionary definition, a flag is set, and when reading of that definition finishes, variable type (the value of definition attribute _type), item name (attribute _name) and variable name are output in a simple tag-value format and in a standard order. For DDL2 domain dictionaries, values of _item_aliases.alias_name and _item_ linked.parent_name entries, if present, are also output. The DDL parser thus transforms and simplifies the dictionary contents.

If the _name(DDL1)/_item.name(DDL2) attribute occurs inside a loop, that is, a number of names appear in one definition block, the variable name for that particular definition block will be given an extra array dimension by CifSieve, equal to the number of items in the loop. When a CIF name from this loop is found in a CIF file, the value will be read into the respective array location. If an _item_aliases.alias_name attribute is present (DDL2), the alias will also be recognized in CIF input files. If this attribute occurs together with looped item names in the domain dictionary, an attempt is made to determine the parent _item.name in that loop to which this _item_aliases.alias_name refers. This is done within BuildSiv by examining _item_linked.parent_name entries within the same definition block.

3.2. Building cifsiv_: BuildSiv

The BuildSiv program, which is a shell-like script written in Perl (Wall et al., 1996[Wall, L., Christiansen, T. & Schwartz, R. L. (1996). Programming Perl, 2nd ed. California, USA: O`Reilly.]), creates four source files, which together describe the final input function. One file, cifvars.h, is a list of declarations for the variables which will contain the CIF data. The second file contains a short C-language wrapper function which calls the parser generated by Bison and Flex. The next two files, cifsiv.y and cifsiv.lex, are specification files for the parser and its lexical analyser, which are constructed from these files by Bison and Flex, respectively.

BuildSiv first calls the DDL parser described above, which returns a simplified version of the domain dictionary. The contents of this file are read and a Flex specification file constructed for lexical analysis of the CIF file. Variable declarations are output to file cifvars.h as each variable name and type is encountered.

At present, the correspondence between a DDL1 domain dictionary _type attribute and the compiled function variable type is as follows: numb becomes double (C) or REAL*8 (Fortran). char becomes char[84] (C) or CHARACTER*84 (Fortran). Multiple lines of text cannot be retrieved using the present version of CifSieve.

Dictionaries written using DDL2, such as the mmCIF dictionary, allow internal definition of the meaning of values for _item.type tags. These type definitions are not presently parsed by DDL2; instead, the types defined in mmCIF version 1.0.00 are recognized and mapped such that all numbers (float, int) are treated identically to the DDL1 numb type, and all character types become no more than 84 characters long.

The Bison parser specification file, cifsiv.y, is then composed. The grammar is specified such that, if a target item name is encountered outside a loop, the item value is explicitly copied to the variable name. Looped syntax is more complex. A loop is divided into a looptop and loopbottom, where looptop is a list of looped item names and loopbottom is a list of their values. When one of the target CIF item names is found in the looptop section, a pointer is set to the first entry in the user array variable corresponding to that CIF item. Then, when the bottom part of the loop is being input, data values corresponding to that data item are copied into the position specified by this pointer, and the pointer incremented to point to the next location. If the programmer-specified variable name does not contain an array specification, it will not be included in this section of the grammar specification file, and an error message will therefore be generated if it is encountered inside a loop during CIF file input.

BuildSiv then runs Bison and Flex to generate the C source code for the parser. Finally, the C wrapper program is created and compiled together with the parser source code to produce the final object file containing the cifsiv_ function.

If BuildSiv is called with the -e option, variable declarations for e.s.d.s are also output and e.s.d.s for requested items are read in if they appear after numerical values in the CIF file. Variable names for the e.s.d.s are created by appending the letters ‚esd` to the user variable name.

The generated function is robust relative to syntax and type errors within the CIF file. If an error occurs, variable errornum is set to a nonzero value and an error message is inserted into string errormes. These variables are declared together with the user variables. The parser then discards CIF data until it reaches an understandable set of input values. So, for example, if three numbers appear after an item name instead of one, the second two will be ignored, after the error variables have been set and parsing will continue. Similary, if a serious error occurs within a loop, such as the appearance of an item name not matching an array variable, the entire loop is normally ignored. If a new packet of looped data exceeds the specified array limits, further data in that loop are ignored.

3.3. Fortran interface

When the -f option is given to BuildSiv, a Fortran interface is generated. The automatically generated file forcif.inc defines a common block containing the user-specified variable names.

The Fortran interface is implemented by defining both a C structure, for use within cifsiv_, and an identically constructed Fortran common block, for use by application programs. When cifsiv_ is called from Fortran with the first element of this common block, BLOCKBEG, as a third argument, the cifsiv_ function receives a pointer to that argument and consequently to the beginning of the Fortran common block. This pointer value is used as the address of the beginning of the C structure and item values are thus read into the proper positions within this structure by cifsiv_. After cifsiv_ returns, Fortran application programs can read variable values from this common block.

4. Example

The following example shows the various stages of processing of the edited dictionary. Fig. 1[link] shows an edited dictionary with _variable_name attributes. Fig. 2[link] shows a short test program demonstrating the use of the cifsiv_ function. Fig. 3[link] shows the C header file generated, cifvars.h, which should be included in any routines using the variables.

The Fortran include file, together with a simple program, are given in Figs. 4[link] and 5[link]. The cifvars.h file generated in this case (not shown) is different to that in Fig. 2[link], as all variables are now defined as members of a structure.

[Figure 4]
Figure 4
The example Fortran include file forcif.inc.
[Figure 5]
Figure 5
An example Fortran program using cifsiv.

5. Availability

The program, with installation and detailed operating instructions, is freely available by ftp at ftp://anbf2.kek.jp/pub/cif/cifsieve_1.2.tar.gz. GNU software is available from the Free Software Foundation at ftp://prep.ai.mit.edu/pub/gnu.

Footnotes

Current address: ANBF, KEK-PF, Oho 1-1, Tsukuba, Ibaraki 305, Japan. E-mail: jrh@anbf2.kek.jp.

Acknowledgements

The authors are grateful for helpful discussions with Syd Hall and thank Timo Vaalsta for conceiving the idea behind the Fortran interface. We owe a debt of gratitude to the programmers who have contributed to the GNU project.

References

First citationDonnelly, C. & Stallman, R. (1995). The Bison Manual. Free Software Foundation, 59 Temple Place Suite 330, Boston MA 02111, USA, http://www.fsf.org.
First citationHall, S. R., Allen, F. H. & Brown, I. D. (1991). Acta Cryst. A47, 655–685. CSD CrossRef CAS Web of Science IUCr Journals
First citationHall, S. R. & Bernstein, H. J. (1996). J. Appl. Cryst. 29, 598–603. CrossRef CAS Web of Science IUCr Journals
First citationHall, S. R. & Cook, A. P. F. (1995). J. Chem. Inform. Comput. Sci. 35, 819–825.  CrossRef CAS Web of Science
First citationPaxson, V. (1995). The Flex Manual. Free Software Foundation, 59 Temple Place Suite 330, Boston MA 02111, USA, http://www.fsf.org.
First citationStallman, R. (1993). The GNU Manifesto. Free Software Foundation, 59 Temple Place Suite 330, Boston MA 02111, USA, http://www.fsf.org.
First citationWall, L., Christiansen, T. & Schwartz, R. L. (1996). Programming Perl, 2nd ed. California, USA: O`Reilly.
First citationWestbrook, J. & Hall, S. R. (1995). A Dictionary Description Language for Macromolecular Structure, Draft DDL V2.1.0. IUCr-COMCIFS, Chester, England.
First citationWestbrook, J. D., Hsieh, S. & Fitzgerald, P. M. D. (1997). J. Appl. Cryst. 30, 79–83. CrossRef Web of Science IUCr Journals

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoJOURNAL OF
APPLIED
CRYSTALLOGRAPHY
ISSN: 1600-5767
Follow J. Appl. Cryst.
Sign up for e-alerts
Follow J. Appl. Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds