|
|
|
Figure 3
Effect of duplicate identification on the effective search space for 190 770 entries in PDB snapshot 20260101. The main panel shows that the number of clusters decreases approximately linearly with increasing cluster size on log–log scaling consistent with a power-law-like decay. A long tail corresponds to a small number of highly over-represented entries. The largest ten clusters are labelled. Inset: proportion of entries retained after collapsing each duplicated cluster to a single representative. (a) Duplicates defined as the same unit cell, same space group, same sequence and low r.m.s.d. without fitting, as described in the text. (b) Duplicates defined by lattice clustering, as described in the text. |

journal menu![[Figure 3]](nz5022fig3.jpg)
access


