We (Simon ("Billy") Tyrrell and I have been working on interpreting
crystallography in terms of chemistry. With the help and encouragement of
the IUCr Chester office (esp. Peter and Brian) we have taken the CIFs from
an issue of ActaE and converted them to CML. These are early findings but
they look good. The following comments are provisional and please forgive
us if they are standard knowledge and sound naive.
Almost all (> 98%) of the ca 200 CIFs convert well into chemically
meaningful structures. We have computed the formula_sum from the
_atom_site_occupancy corrected for multiplicity (from the symmetry
operators) and find this agrees almost universally with the
_chemical_formula_sum. We have computed the molecular mass and this also
agrees completely - a few large compounds differ very slightly and suggest
that the atomic masses used may differ between our implementation and the
programs used to compute the CIFs. In particular the tools suggest that
all authors account explicitly for all hydrogen atoms in the structures.
We have then used CML-based software to display the connection tables for
the entries. Where a structure is reported as disordered we have
arbitrarily taken the first (usually the largest occupied) disorder group.
At this stage the connection tables are only meaningful for cases where the
full "molecule" has no crystallographic symmetry elements. For ionic
structures they are also not yet decorated with charges. The connection
tables agree well with the reported structure diagram and/or chemical name.
It is now possible to check the formulae against the
_chemical_formula_moiety - which is the only explicit way of assigning
charges. Again the agreement seems to be good.
For the molecules which lie on symmetry elements there may be a degree of
subjectivity as to how the complete molecule is constructed, especially if
the molecule is polymeric in 1, 2 or 3 directions. One approach is to
generate symmetry related atoms and see if they join onto the growing
fragment. We have also used a heuristic approach which is to use the
author-provided lengths, angles and torsions as defining the chemical
connectivity. Where the molecule has symmetry a number of bonds and angles
are repeated together with their symmetry operator. This allows immediate
identification of the symmetry operations required to generate a larger
molecule. By adding on the symmetry generated replicas and removing common
atoms it is normally possible to generate a complete connection table
compatible with the formula sum or formula moiety and with the name or
structural diagram. We have not yet automatically checked the results of
this exercise against the formula_moiety.
This implies that the crystallographers (and/or the programs they use
and/or the Acta office) have good discipline in reporting the chemistry of
the compounds and that it is reasonable to request enhanced chemical
information in standard CIFs. It suggest that the major programs used
already implicitly store chemical connection tables and should be able to
emit them.
In recent discussions with crystallographers it seems that the preparation
of the chemistry for publication is a significant fraction of the time
taken to "do a crystal structure". Labelling atoms and describing symmetry
is a particular concern
HTH
P.
>_______________________________________________
>coreCIFchem mailing list
>coreCIFchem@iucr.org
>http://scripts.iucr.org/mailman/listinfo/corecifchem
Peter Murray-Rust
Unilever Centre for Molecular Informatics
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069
_______________________________________________
coreCIFchem mailing list
coreCIFchem@iucr.org
http://scripts.iucr.org/mailman/listinfo/corecifchem
Copyright © International Union of Crystallography
IUCr Webmaster