## research papers

## Are the St John's wort Hyp-1 superstructures different?

^{a}Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, 986805 Nebraska Medical Center, Omaha, NE 68198-6805, USA^{*}Correspondence e-mail: gborgstahl@unmc.edu

Two commensurately modulated structures (PDB entries 4n3e and 6sjj) were solved using translational (tNCS). The data required the use of large supercells, sevenfold and ninefold, respectively, to properly index the reflections. Commensurately modulated structures can be challenging to solve. Molecular-replacement software such as *Phaser* can detect tNCS and either handle it automatically or, for more challenging situations, allow the user to enter a tNCS vector, which the software then uses to place the components. Although this approach has been successful in solving these types of challenging structures, it does not make it easy to understand the underlying modulation in the structure or how these two structures are related. An alternate view of this problem is that the atoms and associated parameters are following periodic atomic modulation functions (AMFs) in higher dimensional space, and what is being observed in these supercells are the points where these higher dimensional AMFs intersect physical 3D space. In this case, the two 3D structures, with a sevenfold and a ninefold seem to be quite different. However, describing those structures within the higher dimensional approach makes a strong case that they are closely related, as they show very similar AMFs and can be described with one unique (3+1)D structure, *i.e.* they are two different 3D intersections of the same (3+1)D structure.

Keywords: modulation; tNCS; supercell; superspace.

### 1. Introduction

Commensurately modulated structures are a unique subset of entries in the Protein Data Bank (PDB). The diffraction patterns from these types of structures are distinct in that they consist of strong main reflections with, on average, less intense satellite reflections around the main reflection. Indexing software can usually index the main reflections, but in some cases can have difficulty indexing all of the reflections. For commensurate cases, the indexing software will lock into a solution that is some multiple of the basic *et al.*, 2007; Read & McCoy, 2016) to arrive at solutions where they may have been unable to do so in the past.

Our laboratory has been working on the incommensurately modulated structure of the profilin–actin complex (Lovelace *et al.*, 2008; Porta *et al.*, 2011, 2017) and has developed a set of superspace-analysis tools to aid us in the analysis of our refinements. Our incommensurate case has proven to be more challenging because there is not a that will predict the satellite reflections, so we have had to use an approximation. Additionally, we have a systematic absence in our reflections when the data are represented as a approximation, which is a characteristic of the underlying modulation that has led to some interesting issues that have required some more innovative solutions during (Lovelace *et al.*, 2018). The modulated Hyp-1–ANS complex (PDB entry 4n3e, with a sevenfold supercell) and its cousin (PDB entry 6sjj, with a ninefold supercell) provided us with two distinct large commensurately modulated structures (Sliwiak *et al.*, 2015; Smietanska *et al.*, 2020) to evaluate our superspace-analysis tools. Dusek *et al.* (2003) demonstrated for a small molecule that the correctly chosen can describe the array of crystalline phases (including commensurate and incommensurately modulated phases) observed for sodium carbonate. In they were able to observe how the atomic modulation functions (AMFs) were similar and determine which atom was responsible for the transition to the incommensurate phase. Using this paper as a template, we suspected that by observing the commensurately modulated superstructures in it should be possible to visualize how these modulated structures are related. The authors of the Hyp-1 structures speculated that there must be some kind of deeper connection between the structures, but pointed out that it is difficult to detect one by comparing the 3D structures. The primary difference in the crystallization conditions was the addition of melatonin for the PDB entry 6jss structure, which leads to a requiring two more basic cells.

#### 1.1. tNCS

tNCS occurs when the ; Fig. 1). For PDB entry 4n3e the has non-origin strong peaks about every 1/7 of the (Fig. 1*a*), and the for PDB entry 6sjj has non-origin strong peaks about every 1/9 of the (Fig. 1*b*); these spacings are consistent with an offset of a whole protein molecule when compared with the along *c*. This type of protein-sized spacing in the Patterson maps is a strong indication of tNCS. This challenging molecular-replacement problem was solved using tNCS in *Phaser* (McCoy *et al.*, 2007). Details of the tNCS implementation in *Phaser* and of how it was adapted for PDB entry 4n3e can be found in Sliwiak *et al.* (2014). Both structures required an innovative multistep approach to arrive at the final solution. Although the structures were solved, it was difficult to use the tNCS data to arrive at an understanding of the underlying modulation. This analysis was complicated by the PDB entry 6sjj structure, which should be related to the PDB entry 4n3e structure but appeared to be unique, requiring an additional two basic cells in the We decided to investigate whether things may become more evident by looking at the results of from the perspective. Visualization of supercells in is not a new idea and has been successfully used to help to solve incommensurately modulated small-molecule crystals (Schönleber & Chapuis, 2004). These approaches are no longer seen as often in this field due to the ability of *Superflip* (Palatinus & Chapuis, 2007) to now directly solve these types of structures.

#### 1.2. notation

When the underlying periodic structure of a crystal is broken by a periodic distortion of the structure, the resulting superposition of periodic structure and periodic distortion will produce a diffraction pattern that may not be periodic but will still have long-range order and produce diffraction spots. This case is known as a modulated structure. These structures can be described by , 2007). modulations are identified by the number of extra dimensions needed to describe the diffraction pattern: (3+*d*)D, where *d* is the number of extra dimensions. The simplest of these would be (3+1)D. The is defined by **a**_{1}, **a**_{2}, …, **a**_{n} instead of **a**, **b** and **c**. The reciprocal-space vectors are , , …, instead of , and . Distances are measured as the values **x**_{1}, **x**_{2}, …, **x**_{n} as opposed to **x**, **y** and **z**. Reflections are labelled *h*_{1}, *h*_{2}, …, *h _{n}* instead of

*h*,

*k*,

*l*,

*m*.

#### 1.3. Superspace

, 1980*a*,*b*), describes how periodic atomic functions in higher dimensional space can be used to explain the diffraction patterns of modulated structures observed in physical space. This theory provides a very powerful approach that even allows the indexing of diffraction patterns of incommensurately modulated structures and interpretation of the resulting structures. These types of structures have a very distinct diffraction pattern of main reflections with associated satellites. For diffraction, the approach uses the concept of **q** vectors to describe the location of satellite reflections relative to their main reflection. The number of **q** vectors also indicates the number of extra dimensions needed to describe the modulation. The easiest modulation case is described using (3+1)D where only one **q** vector is needed to describe the satellite reflections. The **q** vector (or the spacing between the main reflection and the first-order satellite) that describes the diffraction pattern of the commensurately modulated structure observed for the PDB entry 4n3e data can be determined from an analysis of the average reflection intensities arranged as a function of *l* (Sliwiak *et al.*, 2015; Fig. 2*a*). In the same manner, the **q** vector for PDB entry 6sjj (Smietanska *et al.*, 2020) can be determined by a similar analysis (Fig. 2*b*). The intensity relationship for main reflections and satellites is that on average first-order satellites will be less intense than the main reflection, second-order satellites will be less intense than first-order satellites, and so on. Locally, this trend may not hold due to the actual shape of the AMFs. Averaged over many reflections, the trend in reflection intensities will hold. Fig. 2(*a*) shows that the first-order satellites are spaced three reflections away from each main reflection, and other related satellites and the main reflections are at positions divisible by seven as a function of *l*. Taken together, main reflections every seven reflections and first-order satellites three reflections away from the main reflections lead to a **q**-vector definition of . Using the **q** vector, which can be interpreted as having three periods of the periodic modulation function every seven basic cells, allows the construction of a diagram (Fig. 3) to demonstrate how a periodic AMF in this higher dimensional space (grey wavy periodic lines in Fig. 3) can account for the observed atomic positions (black circles in Fig. 3) in the where the AMFs intersect with physical space (**R** in Fig. 3). Due to the periodicity of the AMF, all of the observed atomic positions in the can be translated to a single period of the AMF in *t* (Fig. 3, enlarged area), where *t* has a value from 0 to 1 over a single period. In data are usually shown as a function of *t* (lines in but parallel to **R**) or **x**_{4} (lines parallel to **a**_{sn}). The choice depends on the information that is trying to be conveyed. Additionally, it is possible to see that the states encountered in the by traversing the basic cells in physical space (A–G) are not in the same order as encountered in (1–7) (enlarged area in Fig. 3). For (3+1)D the AMFs are line functions, where the **x**_{4} value of the position is determined by the dot product of the average position of the atom in the basic cell in fractional coordinates with the **q** vector plus a phase offset *t*,

where

is the average position of an atom in a basic cell in the crystal and σ_{n} is the **q**-vector coefficient. All atomic parameters in are governed by AMFs.

In the case of the reflection data for PDB entry 6sjj, the main reflections (highest intensity) appear for values of *l* that are divisible by 9. The next highest (first-order satellites) appear four reflections away. This leads to a **q**-vector of . In this case, there are four periods of the modulation wave for every nine basic cells. The reordering of the basic cells in for this 4/9 case is shown in the dashed box in Fig. 3.

### 2. Methods

Details of the processing will focus on the PDB entry 4n3e structure as an example. The processing is the same for the PDB entry 6sjj structure, with the primary difference being that the is a ninefold expansion and the **q** vector is as opposed to . The PDB entry 4n3e structure consists of a sevenfold basic cell expansion into a with four protein chains and some small molecules in the (ASU) of the basic cell where the modulation was along the **x**_{3} dimension (Fig. 4). The resulting has 28 chains in the ASU. The chains composing the first basic cell are A, B, a and b. All of the chains related by the modulation in other individual basic cells will be annotated with a prime. For example, B′ represents the chains in basic cells 1–7 (B, D, F, H, J, L and N), which are all related by tNCS to the B chain in the first basic cell (Fig. 4). The PDB entry 6sjj structure consists of a ninefold basic cell expansion into a (not shown), and like the PDB entry 4n3e structure has four protein chains in the ASU of the basic cell. In this case, chains j′, B′, i′ and A′ of PDB entry 6sjj correspond to chains b′, B′, a′ and A′ of PDB entry 4n3e. figures for chain B′ of PDB entry 4n3e have previously been published (Lovelace & Borgstahl, 2020) but the processing details of how they were generated were not discussed.

*Matlab* (MathWorks) scripts were used to manipulate the PDB files to display the data in These scripts are available in the supporting information and will generate many of the figures shown in this paper. The process of breaking down a (Fig. 5*a*) and transforming it into starts with dividing it into basic cells (Fig. 5*b*). The number of basic cells is the product of the denominators of the rationalized **q** vector. The number of basic cells along any dimension will be equal to the denominator of the **q** vector in that dimension. For PDB entry 4n3e the number of total basic cells is seven because 1 × 1 × 7 = 7 and they are along **x**_{3} because that is the only dimension with a denominator (σ_{3} = 3/7) greater than 1. The function *superorder.m* (provided in the supporting information) will return a rationalized **q** vector to within a tolerance given the **q** vector in decimal form. The basic cells are then all translated to the coordinates of a single basic cell (Fig. 5*c*). An average atomic position (grey dot in Fig. 5*d*) is calculated averaging the related basic cell atoms (grey circles in Fig. 5*d*). The average position is used to calculate an **x**_{4} position for that atom in all of the basic cells (Fig. 5*e*). Sorting can then be performed on the fractional component of **x**_{4} to determine the frame order to animate the basic cell (Fig. 5*f*). Additionally, plots of the approximated AMFs can be created by measuring the displacement from the average position of an atom (black circle) for a single dimension (dashed line) (**x**_{1} in Fig. 5*g* or **x**_{2} in Fig. 5*h*) and then plotting this information as a function of **x**_{4}.

The values in **x**_{4} were calculated (Fig. 6) using the *Matlab* function *superorder.m* (available in the supporting information). Firstly, *superorder.m* assigns a numerical index to the basic cells and then uses the most straightforward starting position (0, 0, 0) and determines equivalent positions in all of the basic cells. Next, it calculates **x**_{4} and then extracts the fractional part by subtracting the lowest integer from the value using the floor function. The floor function is used so that negative values are correctly calculated. The basic cells are sorted by and the basic cell sort order is returned as a list of the numerical indices as well as the associated values of .

The *superorder.m* script will work with full 3D supercells for (3+1)D (a single **q** vector with all nonzero components). The supporting information provides a 2D (along **x**_{1} and **x**_{3}) example found in Wagner & Schönleber (2009). In that paper there are two plots, one of the (Fig. 15) and one showing the basic cells of the rearranged in order (Fig. 23). Although they do not discuss how they performed this rearrangement, the *superorder.m* function can determine the necessary reordering of the 35 basic cells and reproduce the published results.

### 3. Results

The results of transforming the 4n3e into (Fig. 7) demonstrates the power of this method to make the underlying periodic modulation accessible for analysis. Fig. 7 shows the basic cell states in a circle arranged by the value of *t*. The value of *t* was calculated using the corner of the basic cell as the origin. It is shown as a circle to reinforce the idea that AMFs are periodic along the **a**_{s4} direction. The rearrangement of the basic cells transforms what previously appeared to be random motions between neighbouring basic cells in the into a cleaner sequence of consecutive transitions. It is more visually appealing and easier to visualize this by watching the movie (supporting information file `4n3e_ans-3-7-mas.gif`), which shows the basic cells as frames animated in order. These seven snapshots of the continuous AMFs in (3+1)D show that most of the modulation (as viewed down **x**_{3}) is related to an opening and closing of the space between chains b′ and B′ (Fig. 7). The modulation is not smooth in terms of the amount of displacement between frames being similar. Some consecutive frames show a large amount of displacement relative to the previous frame and other frames show very little relative displacement. AMFs that were not smooth were expected because of the higher order observed satellites, which are indicative of more sharp displacements. Additionally, all of the chains modulate relative to the average structure. Black arrows show the relative displacement of the four chains in each basic cell relative to the previous position. The black dashed line is the previous state and the grey line is the average position. When the same analysis is applied to PDB entry 6sjj it shows a very similar modulation (supporting information file `6sjj-ans-4-9-mas.gif`). With an understanding of how to transform the data into an interesting question would be: is there a relationship between the two 3D superstructures in (3+1)D superspace?

PDB entries 4n3e and 6sjj were compared by generating AMF approximations for the centre-of-mass displacements with respect to the three crystal directions **x**_{1}, **x**_{2} and **x**_{3} of the four chains in the basic cell ASU. Fig. 8 shows plots of the displacements (in each of the three primary axes: **x**_{1}, **x**_{2}, **x**_{3}) as a function of **x**_{4} (Fig. 8) for both PDB entry 4n3e (black circles and dashed black lines) and PDB entry 6sjj (grey dots and grey lines). Data from both superstructures overlay with each other in (3+1)D These superspace-plot data support the idea that both observed superstructures are 3D intersections in physical space from the same (3+1)D structure. In their paper, Sliwiak and coworkers observed that the change between the two structures was quite large given that the only difference in the crystallization conditions was the addition of melatonin (Smietanska *et al.*, 2020). They believed that this addition would result in at most a small change to the structure. They had also hoped to see melatonin binding to the complex. Unfortunately, they did not observe melatonin binding in the electron-density maps. In (3+1)D melatonin does appear to make a small change to the **q** vector (less than 1° rotation) and this small change results in changes to the intersection of the higher dimensional structure with physical space, resulting in the observed increase in dimensions as well as the changes to the arrangement of the chains within the compared with the PDB entry 4n3e structure.

Investigation of the AMFs reveals some interesting properties that apply to both the PDB entry 4n3e and 6sjj structures. The AMFs for **x**_{1} and **x**_{2} seem to be mirrored representations, relative to **x**_{4}, between b′/j′ and A′ as well as between B′ and a′/i′. In **x**_{3}, all four chains follow a sawtooth function. The sawtooth function increases linearly up to some maximum followed by a discontinuous or nearly discontinuous reset back to a minimum, where it then begins the linear increase again. The major difference between the four chains is where the reset of the sawtooth occurs in **x**_{4}. Chains a′/i′ and A′ experience the reset simultaneously in the same basic cell (vertical black lines in Fig. 8), whereas chain b′/j′ is lagging and has not reached the peak of the sawtooth and chain B′ has already experienced the sawtooth reset.

The PDB entry 4n3e structure has another interesting feature where there are some occupancy modulations for some of the bound small molecules. An example of an AMF for a small molecule near **a**′ is shown in the supporting information. represents occupancy modulations by having no value for the AMF at these positions by using a function with a discontinuity.

Another aspect that can be explored is the (3+1)D *C*2. Using *Superspace Group Finder* (https://it.iucr.org/resources/finder/; Orlov *et al.*, 2008) and inputting *C*2 in the 3D group set results in 21 potential parent (3+1)D space groups. These space groups can be further reduced using the directions provided in Porta *et al.* (2017) by first limiting the list to those that are possible for protein crystals and then reducing the remaining space groups to those which match the observed parameters of the **q** vector. In this case, there is only one remaining (3+1)D *C*222_{1}(00γ). Table 1 summarizes the space-group data. The main difference between the two (3+1)D unit cells is that the **q** vectors are slightly different (0.015). The indicates that there should be a screw axis down **a**_{3}. The previously observed relationships in the AMFs between b′/j′ and A′ as well as between B′ and a′/i′ seem to support a screw axis as expected for the (3+1)D space group.

### 4. Conclusions

Looking at modulated atomic structures in **q** vector. Additionally, these structures provided validation of the approaches that we are using to evaluate our PA refinements. They also provide independent validation of the tNCS approach that was originally used to solve these structures. By providing the associated scripts as supporting information, we hope that other research groups will be able to more easily perform this analysis in the future on other modulated structures.

### Supporting information

ZIP file containing all supporting information. DOI: https://doi.org//10.1107/S2059798321003740/rr5202sup1.zip

### Acknowledgements

We would like to thank the aperiodic crystallography community for their continued support over the years, but most importantly Václav Petříček and Sander van Smaalen for useful discussion, as well as the Eppley Structural Biology Facility for providing the computational resources to perform this analysis. We would also like to thank Blaine Mooers for his solution to drawing publication-quality Patterson maps in response to a question posed on ccp4bb.

### Funding information

This research was supported by NSF grant MCB-1518145 and the Fred and Pamela Buffett NCI Cancer Center Support Grant P30CA036727.

### References

Dusek, M., Chapuis, G., Meyer, M. & Petricek, V. (2003). *Acta Cryst.* B**59**, 337–352. Web of Science CrossRef ICSD CAS IUCr Journals Google Scholar

Janner, A. & Janssen, T. (1977). *Phys. Rev. B*, **15**, 643–658. CrossRef CAS Web of Science Google Scholar

Janner, A. & Janssen, T. (1980*a*). *Acta Cryst.* A**36**, 399–408. CrossRef CAS IUCr Journals Web of Science Google Scholar

Janner, A. & Janssen, T. (1980*b*). *Acta Cryst.* A**36**, 408–415. CrossRef CAS IUCr Journals Web of Science Google Scholar

Lovelace, J. J. & Borgstahl, G. E. O. (2020). *Crystallogr. Rev.* **26**, 3–50. Web of Science CrossRef CAS PubMed Google Scholar

Lovelace, J. J., Murphy, C. R., Daniels, L., Narayan, K., Schutt, C. E., Lindberg, U., Svensson, C. & Borgstahl, G. E. O. (2008). *J. Appl. Cryst.* **41**, 600–605. Web of Science CrossRef CAS IUCr Journals Google Scholar

Lovelace, J. J., Murshudov, G., Petříček, V. & Borgstahl, G. E. O. (2018). *Acta Cryst.* A**74**, a229. Web of Science CrossRef IUCr Journals Google Scholar

McCoy, A. J., Grosse-Kunstleve, R. W., Adams, P. D., Winn, M. D., Storoni, L. C. & Read, R. J. (2007). *J. Appl. Cryst.* **40**, 658–674. Web of Science CrossRef CAS IUCr Journals Google Scholar

Orlov, I., Palatinus, L. & Chapuis, G. (2008). *J. Appl. Cryst.* **41**, 1182–1186. Web of Science CrossRef CAS IUCr Journals Google Scholar

Palatinus, L. & Chapuis, G. (2007). *J. Appl. Cryst.* **40**, 786–790. Web of Science CrossRef CAS IUCr Journals Google Scholar

Patterson, A. L. (1935). *Z. Kristallogr.* **90**, 517–542. CAS Google Scholar

Porta, J., Lovelace, J. & Borgstahl, G. E. O. (2017). *J. Appl. Cryst.* **50**, 1200–1207. Web of Science CrossRef CAS IUCr Journals Google Scholar

Porta, J., Lovelace, J. J., Schreurs, A. M. M., Kroon-Batenburg, L. M. J. & Borgstahl, G. E. O. (2011). *Acta Cryst.* D**67**, 628–638. Web of Science CrossRef CAS IUCr Journals Google Scholar

Read, R. J. & McCoy, A. J. (2016). *Acta Cryst.* D**72**, 375–387. Web of Science CrossRef IUCr Journals Google Scholar

Schönleber, A. & Chapuis, G. (2004). *Acta Cryst.* B**60**, 108–120. Web of Science CrossRef IUCr Journals Google Scholar

Sliwiak, J., Dauter, Z., Kowiel, M., McCoy, A. J., Read, R. J. & Jaskolski, M. (2015). *Acta Cryst.* D**71**, 829–843. Web of Science CrossRef IUCr Journals Google Scholar

Sliwiak, J., Jaskolski, M., Dauter, Z., McCoy, A. J. & Read, R. J. (2014). *Acta Cryst.* D**70**, 471–480. Web of Science CrossRef CAS IUCr Journals Google Scholar

Smaalen, S. van (2005). *Z. Kristallogr.* **219**, 681–691. Google Scholar

Smaalen, S. van (2007). *Incommensurate Crystallography.* Oxford University Press. Google Scholar

Smietanska, J., Sliwiak, J., Gilski, M., Dauter, Z., Strzalka, R., Wolny, J. & Jaskolski, M. (2020). *Acta Cryst.* D**76**, 653–667. Web of Science CrossRef IUCr Journals Google Scholar

Wagner, T. & Schönleber, A. (2009). *Acta Cryst.* B**65**, 249–268. Web of Science CSD CrossRef CAS IUCr Journals Google Scholar

This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.