checking for duplicate publications

For each submitted structure, a check will be made to determine whether it has already been published. The procedure adopted for organic and metal-organic structures uses software that has been developed by the CCDC. Three stages are involved in the automated process: (i) a file-conversion stage; (ii) a prescan stage; (iii) a scan stage.

(i) File conversion

Relevant parts of the CIF are converted to a CCDC-specific format. As part of the conversion, some syntax and other checking of the CIF is done. Detected errors will be displayed at the start of the output, for example, CIF conversion error messages:

#INFO     CIFER *WARNING* Melting point possibly wrong - 68-69
          CIFER *WARNING* Cell temp 203, Main diffrn temp 293
          CIFER *WARNING* Common name used in #COMPND

A very important part of stage (i) is the so-called 'Make2D' process. The crystal atomic coordinates are used to generate the three-dimensional (3D) crystal connectivity. Analysis of the geometry of the 3D structure enables the software to assign bond orders. Thus, from the 3D representation a two-dimensional (2D) chemical diagram is generated.

(ii) Prescan

This stage and stage (iii) are carried out with respect to two databases: the master database maintained at the CCDC with entries identified by a six letter refcode, and the current database maintained at the CCDC in which entries carry a logging code e.g. H7050501. (The current database is the database of entries corresponding to structures that have been published and are in the course of preparation for archival to the master database.)

Each submitted structure is compared against every entry in both the master database and the current database. A hit is registered for the following situations:

(a)

  • For organic structures, when the submitted entry and database entry have the same molecular formula (H,D counts may differ by <=2) and each of the reduced unit cell lengths agree within 0.4 Å.
  • For metal-organic structures, when the submitted entry and database entry have the same empirical formula (H,D counts may differ by <=2) and each of the reduced unit cell lengths agree within 0.4 Å.
Note that each molecular formula is divided by the highest common factor of the constituent element counts - this aims to cope with polymer or oligomer versus monomer representations.

(b) The submitted entry and database entry have the same constituent elements (H and D are ignored) and reduced cell lengths agree within a mean difference of 0.04 Å.

If no hit is registered then the following message is output:

       No matches found

(iii) Scan

For each hit detected by the prescan process, a detailed comparison is made between various fields of the submitted entry and the hit database entry. To each comparison there is an attached score and, as equalities are determined, so the score grows incrementally. In the output, the essentials of the submitted entry are listed followed by those of the database entry with the cumulative score and the alert level.


example 1


database duplication summary

  • Chemical name = octane-1,8-dicarboxylic acid
  • R factor = 0.051
  • Space group = P 21/c
  • Formula = C10 H18 O4
  • a = 10.9197 b = 8.361 c = 8.885
  • alpha = 90 beta = 92.273 gamma = 90

possible matches:

Red AlertAlert Level A:
  • Chemical name = Sebacic acid
  • CCDC refcode = SEBAAC01
  • R factor = .01570
  • Space group = P21/a
  • Formula = C10 H18 O4
  • a = 10.10 b = 5.00 c = 15.10
  • alpha = 90 beta = 133.8 gamma = 90
  • Authors = J.Housty, M.Hospital
  • Reference = ActaCrystallogr.,20,325,1966
  • Remarks =
  • CCDC score = 2867

Red AlertAlert Level A:
  • Chemical name = Sebacic acid
  • CCDC refcode = SEBAAC
  • R factor = .085
  • Space group = P21/c
  • Formula = C10 H18 O4
  • a = 15.040 b = 5.000 c = 10.070
  • alpha = 90 beta = 133.13 gamma = 90
  • Authors = J.D.Morrison, J.M.Robertson
  • Reference = J.Chem.Soc.,,993,1949
  • Remarks =
  • CCDC score = 2783

Red AlertAlert Level A:
  • Chemical name = Sebacic acid
  • CCDC refcode = SEBAAC02
  • R factor not given
  • Space group = P21/c
  • Formula = C10 H18 O4
  • a = 15.064 b = 4.987 c = 10.142
  • alpha = 90 beta = 133.14 gamma = 90
  • Authors = Y.Haget,M.A.Cuevas,N.B.Chanh,L.Bonpunt,M.Font-Altaba
  • Reference = J.Appl.Crystallogr.,13,93,1980
  • Remarks =
  • CCDC score = 2471

example 2


database duplication summary

  • Chemical name = Pentafluorobenzonitrile
  • R factor = 0.036
  • Space group = Cmca
  • Formula = C7 F5 N
  • a = 7.6864 b = 9.5175 c = 18.348
  • alpha = 90 beta = 90 gamma = 90

possible matches:

Yellow AlertAlert Level C:
  • Chemical name = Pentafluorophenyl isocyanide
  • CCDC refcode = KUMXOF
  • R factor = 0.050
  • Space group = Cmca
  • Formula = C7 F5 N1
  • a = 7.611 b = 9.427 c = 18.603
  • alpha = 90 beta = 90 gamma = 90
  • Authors = D.Lentz, D.Preugschat
  • Reference = ActaCrystallogr.Sect.C(Cr.Str.Comm.),49,52,1993
  • Remarks =
  • CCDC score = 844

example 3


database duplication summary

  • Chemical name = 5-Methyl-2-phenyl-4-((Z)-(1-naphthylamino)phenylmethylene)-3H-pyrazol-3-one
  • R factor = 0.044
  • Space group = P 21/n
  • Formula = C27 H21 N3 O1
  • a = 10.036 b = 18.235 c = 12.034
  • alpha = 90 beta=106.711 gamma = 90

possible match:

Red AlertAlert Level A:
  • Chemical name = 4-((1-naphthylamino)phenylmethylene)-5-methyl-2-phenyl-2H-pyrazol-3(4H)one
  • CCDC refcode = H7050501
  • R factor = 0.053
  • Space group = P21/n
  • Formula = C27 H21 N3 O1
  • a = 10.024 b = 18.171 c = 11.982
  • alpha = 90 beta = 106.682 gamma = 90
  • Authors = J.-L. Wang, Sh.-M. Zhang, Ai.-X. Li
  • Reference = Pol.J.Chem.,77,1053,2003
  • Remarks = CCDC 204442
  • CCDC score = 3754

In example 3 there is a possible hit with an entry in the current database.

The authors' CIF for a possible duplicate (or 'hit') may be requested from the CCDC supplementary data archive http://www.ccdc.cam.ac.uk/data_request/cif by completing the request form with the journal citation and the CCDC number (e.g. Remarks = CCDC 204442).

At any time you can seek clarification from the CCDC editorial team by contacting them at deposit@ccdc.cam.ac.uk.



Research communications

The first papers in this new format were published in July 2014. Research communications are longer papers with new text sections designed to help authors bring out the science behind their structure determinations. Figures are included in the published paper and, for the first time in Acta E, individual reports are not limited to single structure determinations. The Research communications format will make Acta E the natural home for structure determinations with interesting science to report.

Follow Acta Cryst. E
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds