view article

Figure 3
Effect of duplicate identification on the effective search space for 190 770 entries in PDB snapshot 20260101. The main panel shows that the number of clusters decreases approximately linearly with increasing cluster size on log–log scaling consistent with a power-law-like decay. A long tail corresponds to a small number of highly over-represented entries. The largest ten clusters are labelled. Inset: proportion of entries retained after collapsing each duplicated cluster to a single representative. (a) Duplicates defined as the same unit cell, same space group, same sequence and low r.m.s.d. without fitting, as described in the text. (b) Duplicates defined by lattice clustering, as described in the text.

Journal logoSTRUCTURAL
BIOLOGY
ISSN: 2059-7983
Follow Acta Cryst. D
Sign up for e-alerts
Follow Acta Cryst. on Twitter
Follow us on facebook
Sign up for RSS feeds