checking for duplicate publications

For each submitted structure, a check will be made to determine whether it has already been published. The procedure adopted for organic and metal-organic structures uses software that has been developed by the CCDC. Three stages are involved in the automated process: (i) a file-conversion stage; (ii) a prescan stage; (iii) a scan stage.

(i) File conversion

Relevant parts of the CIF are converted to a CCDC-specific format. As part of the conversion, some syntax and other checking of the CIF is done. Detected errors will be displayed at the start of the output, for example, CIF conversion error messages:

#INFO     CIFER *WARNING* Melting point possibly wrong - 68-69
          CIFER *WARNING* Cell temp 203, Main diffrn temp 293
          CIFER *WARNING* Common name used in #COMPND

A very important part of stage (i) is the so-called 'Make2D' process. The crystal atomic coordinates are used to generate the three-dimensional (3D) crystal connectivity. Analysis of the geometry of the 3D structure enables the software to assign bond orders. Thus, from the 3D representation a two-dimensional (2D) chemical diagram is generated.

(ii) Prescan

This stage and stage (iii) are carried out with respect to two databases: the master database maintained at the CCDC with entries identified by a six letter refcode, and the current database maintained at the CCDC in which entries carry a logging code e.g. H7050501. (The current database is the database of entries corresponding to structures that have been published and are in the course of preparation for archival to the master database.)

Each submitted structure is compared against every entry in both the master database and the current database. A hit is registered for the following situations:

(a)

Note that each molecular formula is divided by the highest common factor of the constituent element counts - this aims to cope with polymer or oligomer versus monomer representations.

(b) The submitted entry and database entry have the same constituent elements (H and D are ignored) and reduced cell lengths agree within a mean difference of 0.04 Å.

If no hit is registered then the following message is output:

       No matches found

(iii) Scan

For each hit detected by the prescan process, a detailed comparison is made between various fields of the submitted entry and the hit database entry. To each comparison there is an attached score and, as equalities are determined, so the score grows incrementally. In the output, the essentials of the submitted entry are listed followed by those of the database entry with the cumulative score and the alert level.


example 1


database duplication summary


possible matches:

Red AlertAlert Level A:
Red AlertAlert Level A:
Red AlertAlert Level A:

example 2


database duplication summary


possible matches:

Yellow AlertAlert Level C:

example 3


database duplication summary


possible match:

Red AlertAlert Level A:

In example 3 there is a possible hit with an entry in the current database.

The authors' CIF for a possible duplicate (or 'hit') may be requested from the CCDC supplementary data archive http://www.ccdc.cam.ac.uk/data_request/cif by completing the request form with the journal citation and the CCDC number (e.g. Remarks = CCDC 204442).

At any time you can seek clarification from the CCDC editorial team by contacting them at deposit@ccdc.cam.ac.uk.