research papers\(\def\hfill{\hskip 5em}\def\hfil{\hskip 3em}\def\eqno#1{\hfil {#1}}\)

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733

Report of the Working Group on Crystal Phase Identifiers

aBrockhouse Institute for Materials Research, McMaster University, Hamilton, Ontario L8S 4M1, Canada, bPhysics Department, Southern Oregon University, 1250 Siskiyou Boulevard, Ashland, OR 97520-5074, USA, cCrystal Impact, Postfach 1251, D-53002 Bonn, Germany, dInternational Centre for Diffraction Data, 12 Campus Boulevard, Newtown Square, PA 19073-3273, USA, eNational Institute for Standards and Technology and the Inorganic Crystal Structure Database, 100 Bureau Drive Stop 8520, Gaithersburg, MD 20899, USA, fCambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, England, gPauling File, Material Phases Data System, 400 Schwanden, CH-6354 Vitznau, Switzerland, hProtein Databank, RCSB, Department of Chemistry, Rutgers, The State University of New Jersey, 610 Taylor Road, Piscataway, NJ, USA, and iInternational Union of Crystallography, 5 Abbey Square, Chester CH1 2HU, England
*Correspondence e-mail: idbrown@mcmaster.ca

(Received 23 March 2005; accepted 5 October 2005)

The proposed crystalline phase identifier consists of a number of components (layers) describing enough properties of the phase to allow a unique identification. These layers consist of the chemical formula, a flag indicating the state of matter, the space-group number and the Wyckoff sequence. They are defined in such a way that they can be incorporated into the IUPAC International Chemical Identifier (InChI) proposed by the International Union of Pure and Applied Chemistry (IUPAC).

1. Introduction

The International Union of Pure and Applied Chemistry (IUPAC) has been examining standards for the electronic representation of chemical information, and as part of this effort it has established a Working Group in conjunction with the USA National Institute for Standards and Technology (NIST) to propose an IUPAC International Chemical Identifier (InChI), which would uniquely identify any chemical compound appearing in an electronic database. The InChI Working Group approached the Commission on Crystallographic Nomenclature (CCN) of the International Union of Crystallography (IUCr) to enquire if any conventions existed for a crystalline phase identifier that might be incorporated into InChI. As the only such convention already approved by the CCN on phase-transition nomenclature (Tolédano et al., 1998[Tolédano, J.-C., Glazer, A. M., Hahn, Th., Parthé, E., Roth, R. S., Berry, R. S., Metselaar, R. & Abrahams, S. C. (1998). Acta Cryst. (1998). A54, 1028-1033.], 2001[Tolédano, J.-C., Berry, R. S., Brown, P. J., Glazer, A. M., Metselaar, R., Pandey, D., Perez-Mato, J. M., Roth, R. S. & Abrahams, S. C. (2001). Acta Cryst. A57, 614-626.]) is not readily adaptable for electronic use, the CCN established the Working Group on Crystalline Phase Identifiers to make recommendations that could be of use to the InChI Working Group. This document is the report of this Group.

The Working Group carried out all its discussions by e-mail, initially independently of the InChI project, but the resulting recommendations of the two groups developed structures for the identifiers that are so similar that incorporating the crystalline phase information into InChI should be trivial. Our recommendations are therefore cast in the form of additions to InChI, in the knowledge that the resulting identifier can be as readily used by the crystallographic as by the chemical community.

2. Terms of reference of the Working Group

The Working Group was charged with recommending to the IUCr Commission on Crystallographic Nomenclature:

1. the best method of defining a crystalline phase identifier that uniquely and unambiguously identifies each crystalline phase in a way that would allow it to be used to link the same material appearing in different electronic databases;

2. the best way in which this identifier can be implemented, including its incorporation in the CCN recommended phase-transition nomenclature.

Keeping in mind that the primary purpose of the crystalline phase identifier is to allow the properties of a given material to be located in different databases, the Working Group should consult with appropriate crystallographic databases to ensure that the proposed identifier will be acceptable.

The following were appointed members of the Working Group:

David Brown (Chair)

Sidney Abrahams (Chair of CCN ex officio at time the Working Group was established)

Michael Berndt (Crystal Impact) [deceased]

John Faber (International Centre for Diffraction Data, ICDD)

Vicky Karen (National Institute for Standards and Technology, NIST, and the Inorganic Crystal Structure Database, ICSD)

Sam Motherwell (Cambridge Crystallographic Data Centre, CCDC)

Jean-Claude Tolédano (Chair CCN Working Group on Phase Transition Nomenclature) [resigned during the working phase of the Group]

Pierre Villars (Pauling File)

John Westbrook (Protein Databank)

Brian McMahon (IUCr, consultant).

3. General considerations

Early discussions revealed that many phases have not been sufficiently well characterized to allow an unambiguous assignment of an identifier, and in some cases phases have been incorrectly characterized. In these situations, no identifier can be expected to meet the requirements of the terms of reference but an identifier may be able to retrieve a number of possible matches from which the user could make a final choice.

For well characterized materials, the Working Group examined two models. In the first, an arbitrary character string is assigned to each crystallographic phase by a competent authority (similar to the Chemical Abstracts Registry Number). In the second, a character string is generated from the known properties of the compound according to a defined set of rules.

The first choice was rejected on the grounds that we would be unlikely to find a competent authority willing to take on the project. Such an authority would require external funding, since it would have to assign identifiers on request in a timely manner and would have to maintain a public list of the identifiers already assigned.

The second choice has the advantage that the identifier can be constructed by anyone with access to the information needed to characterize the material. The identifier can be kept to a manageable size because it only needs to include sufficient information to distinguish between known phases. Even if more information about the phase is available, it is not included in the identifier if it is not needed for characterization. For example, OsI3, which is known in only one crystalline form, is fully characterized by its chemical formula alone and no further information, chemical or crystallographic, need be included.

The first component of any phase identifier must be the composition and, where necessary, the isomer. Only then does it make sense to identify the crystalline form. Since InChI is designed to identify the composition and bond topology of the compound, the Working Group's job was to suggest an identifier that would distinguish between the different crystalline forms of a given chemical compound.

4. The InChI identifier

Before presenting the recommendations of the Working Group, we give a brief description of the proposed InChI, which at the time of writing had not been officially adopted by IUPAC.

The InChI Working Group is recommending an identifier made up of several components or layers.

• The first (top) layer, which is always present, gives the chemical composition. The lower layers, which constitute the identifier proper, are included only if they are necessary to distinguish between two compounds with the same composition.

• The second layer distinguishes between different isomers by describing the bond topology. It contains several sublayers or levels, the first giving the bonding topology ignoring all the bonds to metals, cations and H atoms. The second level adds the bonding to fixed H atoms, the third adds the bonding to variable H atoms (to distinguish between tautomers if this is needed) and the fourth level adds bonds to metal atoms and cations and is used in the rare cases when a compound forms different coordination isomers.

• The third layer contains information on chiral centers and is included only when it is necessary to distinguish between stereoisomers.

• The fourth layer is used to identify isotopically enriched compounds. Further layers can be added as needed. For many compounds, only the first layer is needed and for most of the others it is only necessary to add the top levels of the second layer.

4.1. Construction of the IUPAC chemical identifier

At the time of writing, the final form of InChI had not been fixed, but sufficient details had been developed to allow the definition of a crystalline phase identifier. InChI can be formatted in different ways, in particular as an ASCII string or an XML file. As the ASCII string is more compact and easier to follow, InChI is described below in this format. In the following example a slash, /, is used to separate the layers.

1.00Beta/C6H9N3O3/CT:7-4(10)1-2(5(8)11)3(1)6(9)12/

H:1-3H,(H2,7,10)(H2,8,11)(H2,9,12)/

SC:1-,2-,3-/I:(1D)/SC:m/is:0/ST:abs

The following is an explanation of the above InChI. The important items are the first three or four – the remainder in this example deal with a description of the stereochemistry and will not be frequently used.[link]

1.00Beta/ # Version of InChI
C6H9N3O3/ # Sum formula
# The identifier proper begins here
CT:7-4(10)1-2(5(8)11)3(1)6(9)12/ # Basic connectivity
H:1-3H,(H2,7,10)(H2,8,11)(H2,9,12)/ # Hydrogen connectivity
SC:1-,2-,3-/ # Stereocenters, sp3
I:(1D)/ # Isotopes (H1 is deuterium)
SC:m/ # Same as in main
is:0/ # Inverted stereo (absolute stereo only)
ST:abs # Abs (absolute), rel (relative) or rac (racemic)
# End of identifier

The identifier may be followed by auxiliary information such as atomic coordinates. These are not part of the identifier proper and are not shown here. They are not further discussed in this report.

All but the first two items (which are required and are not part of the identifier proper) are introduced by one of the following tags:

"CT:";  /* connectivity */

"H:";  /* H-atoms */

"C:";  /* charge */

"DB:";  /* double-bond stereo */

"SC:";  /* stereo centers sp3 */

"is:";  /* mark sp3 inverted stereo */

"SR:";  /* mark sp3 racemic stereo */

"ST:";  /* abs, rel, rac */

"I:";  /* isotopic atoms */

"fH:";  /* fixed H – first item in non-taut */

"N:";  /* original atom numbers in canonical order */

"NT:";  /* non-tautomeric original atom numbers */

   /* in canonical order – first item */

   /* in non-tautomeric aux info */

"E:";  /* atoms equivalence */

"tE:";  /* tautomeric groups equivalence */

"iC:";  /* inverted (stereo) centers */

"iN:";  /* inverted sp3 stereo original atom */

   /* numbers in canonical order */

"NI:";  /* isotopic original atom numbers in */

   /* canonical order */

   /* first item in isotopic aux info */

"TR:";  /* transposition of components in */

    /* non-tautomeric representation */

"CRV:";  /* charges, radical, valence*/

"XYZ:";  /* xyz coordinates */

The development of InChI has so far focused on organic molecules and resolving isomers, tautomers and enantiomers. Version 1.0 will therefore be restricted to describing the topology of finite molecules and is not being designed to describe the connectivity of infinite structures. This should not present a problem for devising an InChI for crystalline phase identification because, if the composition and space group of an infinitely connected inorganic compound are given (the two essential layers for any phase identification), the connectivity is rarely needed.

5. Recommendations

We RECOMMEND that the phase identifier be included as part of the proposed InChI symbol and that the crystallographic characterization appear in InChI in three additional layers, which are numbered 5, 6 and 7 in the recommendations below.

Note that it is only necessary to define the format of the value of each layer. The layers can be assembled in various ways as they are in the InChI standard, e.g. as an ASCII string labeled with tags (used in this document) or as an XML file. There is no canonical form for the whole identifier, only for the individual layers, which may be used with or without their associated tag.

5.1. Layer 5: state of matter

This layer gives the state of matter: gas, liquid, crystal etc. according to the following enumeration list:[link]

gas gas phase
liq liquid phase
sol solid phase of unknown form
xtl crystalline solid
qxl quasicrystal
ams amorphous solid
lxl liquid crystal or other anomalous quasi-liquid phase

Only if this flag is set to xtl would layers 6 and 7 be needed. A crystal is defined as a phase for which, in principle, it is possible to assign a space group, even if that space group is only related to the full space group (as in the case of aperiodic structures). The presence therefore of a space-group field implies that the state of matter is xtl. In the case where the space group is given, the state-of-matter field is redundant and may be omitted (although it is included in some of the examples below by way of illustration).

5.2. Layer 6: space group

This layer contains the space-group number as given in International Tables for Crystallography (2005[International Tables for Crystallography (2005). Vol. A. Dordrecht: Kluwer Academic Publishers.]), Vol. A. It consists of a number between 1 and 230 that uniquely identifies the space-group type except for aperiodic crystals, cf. Jansen et al. (2002[Janssen, T., Birman, J. L., Dénoyer, F., Koptsik, V. A., Verger-Gaugry, J. L., Weigel, D., Yamamoto, A., Abrahams, S. C. & Kopsky, V. (2002). Acta Cryst. A58, 605-621.]). The only ambiguity occurs for space groups such as P41 (No. 76) and P43 (No. 78) that are identical except for their chirality, which is more appropriately identified in the InChI stereochemistry layer. Chirality is an important molecular property but the chirality of a crystal, which is often not determined, is usually only of interest if the crystal contains a chiral molecule. Chiral space groups should therefore be treated as equivalent. We recommend that only the lowest number of each chiral pair of space groups be given, but search algorithms should equivalence these pairs in case the higher space-group number is inadvertently used. The equivalent space groups are listed below.

Problems in assigning the space group can arise in several situations. Many inorganic compounds have polymorphs with similar structures that crystallize with different, but related, space groups. In these cases, it is easy to assign the wrong space group if only a subcell of the true crystallographic unit cell is reported. Incommensurately modulated structures, which are frequently associated with polymorphism, have additional symmetries that do not appear in the standard table of space groups. Usually an average space group can be assigned, but this is not always unique. Quasicrystals cannot be assigned a conventional space group and are best treated as a different state of matter (see §5.1[link]).

5.3. Layer 7: Wyckoff sequence

In the rare event that two phases of the same compound have the same space group, the Wyckoff sequence can be given. This is a list containing the Wyckoff letters associated with the occupied special positions (sites of high symmetry). Details of the special positions and their Wyckoff letters for all 230 space groups are given in International Tables for Crystallography (2005[International Tables for Crystallography (2005). Vol. A. Dordrecht: Kluwer Academic Publishers.]), Vol. A. Each letter is accompanied by a number indicating the number of symmetry-independent atoms occupying sites of that kind (the default number is 1), e.g. `a1 d1 i6', which is written as `adi6'. The enumeration list contains all the letters of the alphabet plus & for `alpha' and the letters are listed in alphabetic order. Before determining the Wyckoff sequence, it is essential that the structure be standardized using the procedure described in §6.2.4[link].

5.4. Possible further layers

There are a few cases where the layers proposed above do not fully differentiate between distinct phases of the same compound. For example, metallic iron (see §7.2.2[link]) has two body-centered-cubic phases separated by a face-centered-cubic phase. These two phases have exactly the same identifier using the layers defined above. However, they could be differentiated using their reduced cells. With experience in using the layers defined above, the need for further layers may become apparent. At that time, it would be appropriate to consider what further layers should be added. Possibilities include the reduced cell, an incommensuration flag or an indicator of magnetic or electric properties.

6. Proposed additions to InChI for phase identification

This section provides the text that should be inserted into any InChI definition that incorporates the proposals of this document.

6.1. New tags

The following is a list of additional tags required for phase identification expressed in the form of an InChI. These would be used in conjunction with existing InChI tags:

"PH:" /* phase or state of matter. Allowed values are: */

   /* gas, liq, ams, sol, xtl, lxl, qxl */

"SG:" /* space-group number, integers between 1 and 230 */

"WS:" /* Wyckoff sequence, any lower case letter */

   /* or & (for alpha), possibly separated by numbers */

6.2. Formal definitions

6.2.1. Composition

The composition layer in InChI for a crystalline phase must give the contents of the formula unit of the crystal. This is a unit in general no smaller than the crystallographic asymmetric unit and no larger than the primitive unit cell. It is NOT the same as the formula of the molecule of interest unless the molecule is the only component of the crystal. All components, including solvents of crystallization, MUST BE EXPLICITLY INCLUDED. Wherever possible, the formula unit is chosen so that the multipliers of the elements are integers with no common divisor. In cases where it is not possible to choose a formula unit smaller than the primitive unit cell without using non-integral multipliers, e.g. FeS1.83 = Fe1.09S2, La1.95NiO4.31 and many minerals, the size of the formula unit is indeterminate and only the relative multipliers are meaningful. Testing should be carried out in this case by normalizing the multipliers. Any normalization can be used but an obvious one would be to reduce the largest multiplier to 1.00 and the others in proportion. When non-integral multipliers are encountered, searches should include a tolerance factor to allow for experimental uncertainties or to retrieve related compounds of the same phase having a similar but not identical composition. The tolerance should be large enough to recognize that phase identifiers which include trace elements are equivalent to phase identifiers in which the trace elements have been omitted either because they were not determined or because they were not considered to be important. The size of the tolerance factor is not defined in this standard and its choice will be determined by the nature of the required search. For example, a search on FeS2 might include a tolerance factor of 0.2 to be sure of locating all examples of the phase.

6.2.2. PH: the state of matter

Seven flags are defined for a number of different states.

gas   gas

liq   liquid

ams  amorphous

sol  solid of unknown form

xtl  crystal (capable of being assigned a space group)

lxl  liquid crystal

qxl  quasicrystal

Only if the value of PH is `xtl' will the following two layers be meaningful. Therefore, if the SG: field is given, the PH: field may be omitted.

6.2.3. SG: space group

This is a number between 1 and 230 inclusive, being the number of the space group of the crystal as given in International Tables for Crystallography (2005[International Tables for Crystallography (2005). Vol. A. Dordrecht: Kluwer Academic Publishers.]), Vol. A. The following space-group pairs are identical except for their chirality: 76 = 78, 91 = 95, 92 = 96, 144 = 145, 151 = 153, 152 = 154, 169 = 170, 171 = 172, 178 = 179, 180 = 181, 212 = 213. Only the lower space-group number of each pair should be used. The chirality is often not determined and is only significant if the crystal contains a molecule whose chirality is described elsewhere in the InChI. However, one of the forbidden space-group numbers may be inadvertently used and software should be prepared to convert it to its legal equivalent. There are many cases where the true space group is not known. Different approximate space groups might be assigned by different workers, in which case a valid match would be missed, but there is little that can be done to overcome this problem. Incommensurate phases should be assigned the space group of their parent structure (the first portion of the incommensurate space-group symbol).

6.2.4. WS: the Wyckoff sequence

This is an alphabetic list of the Wyckoff symbols (letters) of the occupied special positions. International Tables for Crystallography (2005[International Tables for Crystallography (2005). Vol. A. Dordrecht: Kluwer Academic Publishers.]), Vol. A, lists the Wyckoff letters for all special positions, that is, all sites having a crystallographically distinct site symmetry. Each letter is followed by a number indicating the number of symmetry-independent atoms occupying sites of that kind (the number 1 is omitted), e.g. `a1 d1 i6' is written as `adi6'. The enumeration list contains all the lower-case letters of the alphabet plus `&' for `alpha' found in space group No. 47. The letters are listed in alphabetic order but, before determining the Wyckoff sequence, the structure must be normalized according to the algorithm described by Parthé & Gelato (1984[Parthé, E. & Gelato, L. M. (1984). Acta Cryst. A40, 169-183,], 1985[Parthé, E. & Gelato, L. M. (1985). Acta Cryst. A41, 142-151.]). This algorithm is based on the use of the standard space-group settings given in International Tables for Crystallography (2005[International Tables for Crystallography (2005). Vol. A. Dordrecht: Kluwer Academic Publishers.]), Vol. A, with the lattice parameters that are not determined by symmetry chosen according to a particular set of rules. The atomic coordinates are then transformed to ensure a consistent choice of Wyckoff letter in the case where several Wyckoff letters refer to sites having the same symmetry.1

6.3. Examples

In both the following examples of InChIs that define crystalline phases, the PH: field is not needed and the WS: field is also probably not necessary, but they are included for the purposes of illustration. Further examples are given in §7.2[link]

Rutile 1.02/TiO2/PH:xtl/SG:136/WS:af2

(CH3)3NCH2COO□CaCl2□2H2O 1.02/C4H15CaCl2NO4/PH:xtl/SG:33/WS:ae28

7. Incorporation of the phase identifier into the IUCr–CCN phase-transition symbol

7.1. Description of the IUCr–CCN phase-transition symbol

Recently, the Commission on Crystallographic Nomenclature of the International Union of Crystallography adopted a phase-transition nomenclature (Tolédano et al., 1998[Tolédano, J.-C., Glazer, A. M., Hahn, Th., Parthé, E., Roth, R. S., Berry, R. S., Metselaar, R. & Abrahams, S. C. (1998). Acta Cryst. (1998). A54, 1028-1033.], 2001[Tolédano, J.-C., Berry, R. S., Brown, P. J., Glazer, A. M., Metselaar, R., Pandey, D., Perez-Mato, J. M., Roth, R. S. & Abrahams, S. C. (2001). Acta Cryst. A57, 614-626.]).2

The phase transitions are identified by the two phases that bracket the transition, so the nomenclature is more properly a nomenclature of the phases themselves. The IUCr CCN phase-transition symbol is composed of six fields defined as follows.

1. The common symbol used to identify the phase (e.g. alpha, II etc.).

2. The temperature range (pressure range or other external condition) in which the phase is stable.

3. The Hermann–Mauguin symbol and number of the space group. More than one space group may be given or the Bravais symbol may be given if the space group is not known.

4. Z, the number of formula units in the conventional unit cell (although the formula unit is not defined within the symbol).

5. The ferroic properties.

6. The structure type.

Any field may be omitted if inapplicable or if the value is not known. As the symbol was not designed for computer use, the formats are not tightly structured and may contain non-ASCII characters. The intent of this symbol is to include the maximum identification information for the phase, whereas the philosophy of InChI is to include only the minimum information needed for phase identification. Both include the space group but otherwise there is little overlap between them. The two symbols are complementary and serve different purposes. The InChI could be added to the CCN phase-transition nomenclature but it is not clear whether this would offer any advantage. A method for combining the two, should this be desired, is described in §7.3[link].

7.2. Examples of the CCN nomenclature and the InChI crystalline phase identifier

In each of the following examples, the phase-transition nomenclature is given first followed on the next line by the proposed InChI symbol.

7.2.1. Three phases of potassium tellurium bromide (K2TeBr6) (Abrahams et al., 1984[Abrahams, S. C., Ihringer, J., Marsh, P. & Nassau, K. (1984). J. Chem. Phys. 81, 2082-2087.]; Ihringer & Abrahams, 1984[Ihringer, J. & Abrahams, S. C. (1984). Phys. Rev. B, 30, 6540-6548.])

I|>434K|Fm(-3)m (225)|Z = 4|non-ferroic|Type = K2PtCl6.

1.02/Br6K2Te/SG:225

II|434-400K|P4/mnc (128)|Z = 2|ferroelastic|3 variants.

1.02/Br2K2Te/SG:128

III|<400K|P21/n (14)|Z = 2|ferroelastic|12 variants.

1.02/Br6K2Te/SG:14

7.2.2. Iron (Fe) (Donohue, 1974[Donohue, J. (1974). The Structure of the Elements. New York: John Wiley.])

delta|1663K|Im(-3)m (229)|Z = 2|non-ferroic|Type = W. Melting at 1808 K.

1.02/Fe/SG:229

gamma|1663-1183K|Fm(-3)m (225)|Z = 4|non-ferroic|Type = Cu.

1.02/Fe/SG:225

beta|1183-1043K|Im(-3)m (229)|Z = 2|paramagnetic|Type = W.

1.02/Fe/SG:229

alpha|<1043K|Im(-3)m (229)*|Z = 2|ferromagnetic|Type = W.

*Magnetic structure is pseudocubic.

1.02/Fe/SG:229

epsilon|13GPa|P63/mmc (194)|Z = 2|-|Type = Mg.

1.02/Fe/SG:194

Note that the delta and beta phases both have the body-centered-cubic structure and therefore have identical InChIs, but they are distinct phases with different cell dimensions. The Wyckoff sequence /WS:a/ is omitted because it is the same for the two phases and does not further distinguish between them. The correct space group of untwinned alpha (ferromagnetic) iron cannot be isotropic Im[\bar3]m (No. 229) but the deviation from higher symmetry is very small and the correct space group is not yet determined. This illustrates one of the problems that will be encountered in the use of this identifier. At worst, searching on Fe/SG:229 will bring up a small number of false matches.

7.3. Examples of how the InChI might be incorporated into the CCN phase nomenclature

The InChI can be included in the CCN phase nomenclature in several different ways. We recommend that the InChI be placed in front of the phase nomenclature using vertical line separators rather than slashes. A double vertical line separates the two identifiers. If desired, a program could easily split off the InChI for further analysis. The first example is the same compound as shown in §7.2.1[link], the second example is the same compound as shown in §6.3[link].

1.02|Br6K2Te|SG:14||III|<400K|P21/n(14)|Z = 2|Ferro­elastic|12 variants

1.02|C4H15CaCl2NO4|SG:33|WS:ae28||XVI|<50K, 4GPa, <180K|Pn21a(33)|Z = 4|Ferroelectric|Nonmodulated ferroelectric polarization along b

8. The use of the phase identifier in databases

Since the InChI phase identifier is parsable, each of the layers can be formatted and stored in any way that suits the needs of a particular database. Most crystallographic databases will already have fields containing the sum formula and the space-group number, and can readily add a field for the Wyckoff sequence if this is needed. The `state of matter' field, PH:, and the fields describing molecular properties would require further fields in the database if they were present. The canon­ical form of the InChI could be recreated at any time if required or it could be parsed into its respective layers. It is not appropriate to carry out a search using the canonical form of InChI. Searches must be carried out layer by layer since two different identifiers may not contain the same number of layers or the search may not be carried out to its full depth if, for example, chirality or isotopic content were not important.

All the proposed layers can be searched by looking for identical bit sequences, although the SG: field should be initially screened for illegal numbers and the composition field should be normalized if non-integral multipliers are present.

Note added in proof: InChI has now been adopted by IUPAC and full details can be found at http://www.iupac.org/inchi.  Answers to frequently asked questions can be found at http://wwmm.ch.cam.ac.uk/inchifaq/.

Footnotes

Deceased 30 June 2003.

1The Parthé–Gelato (1984[Parthé, E. & Gelato, L. M. (1984). Acta Cryst. A40, 169-183,], 1985[Parthé, E. & Gelato, L. M. (1985). Acta Cryst. A41, 142-151.]) algorithm was selected because it is, to our knowledge, the only one that normalizes the whole structure including the atomic coordinates. Other algorithms normalize only the space-group setting and the unit-cell basis, which by themselves are not sufficient to ensure a unique Wyckoff sequence. This algorithm has been tested in several applications and has been shown to be robust, and furthermore it has been implemented in the program STRUCTURE TIDY, which includes the Wyckoff sequence as part of its output (Gelato & Parthé, 1987[Gelato, L. M. & Parthé, E. (1987). J. Appl. Cryst. 20, 139-143.]). The conventions used in the Parthé–Gelato algorithm differ from those adopted elsewhere in crystallography (see the discussion in Parthé & Gelato, 1984[Parthé, E. & Gelato, L. M. (1984). Acta Cryst. A40, 169-183,]) since they are optimized to produce a robust structure normalization, while other conventions are designed for different purposes, e.g. to emphasize the physical or chemical properties of the crystal or to provide a cell descriptor that is easy to search. The space-group setting and unit-cell basis used in the Parthé–Gelato algorithm, while essential for generating a unique Wyckoff sequence, are discarded after use and do not appear explicitly in the crystalline phase identifier. They therefore do not conflict with the different space-group and unit-cell conventions adopted by the databases using the identifier.

2Available on the IUCr web page //journals.iucr.org/a/issues/2001/05/00/es0301/index.html.

References

First citationAbrahams, S. C., Ihringer, J., Marsh, P. & Nassau, K. (1984). J. Chem. Phys. 81, 2082–2087.  CrossRef CAS Web of Science Google Scholar
First citationDonohue, J. (1974). The Structure of the Elements. New York: John Wiley.  Google Scholar
First citationGelato, L. M. & Parthé, E. (1987). J. Appl. Cryst. 20, 139–143.  CrossRef Web of Science IUCr Journals Google Scholar
First citationIhringer, J. & Abrahams, S. C. (1984). Phys. Rev. B, 30, 6540–6548.  CrossRef CAS Web of Science Google Scholar
First citationInternational Tables for Crystallography (2005). Vol. A. Dordrecht: Kluwer Academic Publishers.  Google Scholar
First citationJanssen, T., Birman, J. L., Dénoyer, F., Koptsik, V. A., Verger-Gaugry, J. L., Weigel, D., Yamamoto, A., Abrahams, S. C. & Kopsky, V. (2002). Acta Cryst. A58, 605–621.  Web of Science CrossRef CAS IUCr Journals Google Scholar
First citationParthé, E. & Gelato, L. M. (1984). Acta Cryst. A40, 169–183,  CrossRef Web of Science IUCr Journals Google Scholar
First citationParthé, E. & Gelato, L. M. (1985). Acta Cryst. A41, 142–151.  CrossRef Web of Science IUCr Journals Google Scholar
First citationTolédano, J.-C., Berry, R. S., Brown, P. J., Glazer, A. M., Metselaar, R., Pandey, D., Perez-Mato, J. M., Roth, R. S. & Abrahams, S. C. (2001). Acta Cryst. A57, 614–626.  Web of Science CrossRef IUCr Journals Google Scholar
First citationTolédano, J.-C., Glazer, A. M., Hahn, Th., Parthé, E., Roth, R. S., Berry, R. S., Metselaar, R. & Abrahams, S. C. (1998). Acta Cryst. (1998). A54, 1028–1033.  CrossRef IUCr Journals Google Scholar

© International Union of Crystallography. Prior permission is not required to reproduce short quotations, tables and figures from this article, provided the original authors and source are cited. For more information, click here.

Journal logoFOUNDATIONS
ADVANCES
ISSN: 2053-2733
Follow Acta Cryst. A
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds