From structure of the complex to understanding of the biology

The most extensive structural information on viruses relates to apparently icosahedral virions and is based on X-ray crystallography and on cryo-electron microscopy single-particle reconstructions. This paper concerns itself with the study of the macromolecular complexes that constitute viruses, using structural hybrid techniques.

The most extensive structural information on viruses relates to apparently icosahedral virions and is based on X-ray crystallography and on cryo-electron microscopy (cryo-EM) singleparticle reconstructions. Both techniques lean heavily on imposing icosahedral symmetry, thereby obscuring any deviation from the assumed symmetry. However, tailed bacteriophages have icosahedral or prolate icosahedral heads that have one obvious unique vertex where the genome can enter for DNA packaging and exit when infecting a host cell. The presence of the tail allows cryo-EM reconstructions in which the special vertex is used to orient the head in a unique manner. Some very large dsDNA icosahedral viruses also develop special vertices thought to be required for infecting host cells. Similarly, preliminary cryo-EM data for the small ssDNA canine parvovirus complexed with receptor suggests that these viruses, previously considered to be accurately icosahedral, might have some asymmetric properties that generate one preferred receptor-binding site on the viral surface. Comparisons are made between rhinoviruses that bind receptor molecules uniformly to all 60 equivalent binding sites, canine parvovirus, which appears to have a preferred receptor-binding site, and bacteriophage T4, which gains major biological advantages on account of its unique vertex and tail organelle.

Introduction
There are two themes which underlie this paper: the power of combining X-ray crystallography with cryo-electron microscopy (cryo-EM) and the role played by specialized vertices in otherwise highly symmetrical icosahedral viruses. The technological aspect of combining different structural techniques is no longer new, as demonstrated by numerous papers and meetings highlighting 'hybrid' methods, such as the meetings every alternate year at Lake Tahoe (http://www.burnham.org/ hybridmethods2006). Hence, little will be mentioned about the technology. Rather, this paper concerns itself with the study of the macromolecular complexes that constitute viruses, using structural hybrid techniques.
Both cryo-EM and crystallography have depended heavily on the power of averaging when studying icosahedral viruses. Thus, any local deviation from icosahedral symmetry will be completely obscured. Indeed, the classical concept of Crick & Watson (1956) that has guided and defined structural virology states that each identical subunit forming the viral capsid must have the same environment. Hence, in general, a virus capsid must have the symmetry of a regular polyhedron and an icosahedron is the polyhedron with the largest number of identical asymmetric subunits. These remarkably correct predictions, however, neglect the asymmetry of the encapsidated genome and of the environment with which the virus must interact when infecting a host.
Among the exceptions to virions that lack exact icosahedral symmetry are the tailed bacteriophages. The head capsid of these viruses has a specialized fivefold vertex that accommodates a machine that functions to package the genome into the pre-formed empty prohead. The same vertex is also used to attach the tail-tube through which the viral genome passes into a host. The tail proteins usually have sixfold symmetry that make a symmetry mismatch with the fivefold symmetry of the head, inconsistent with the expectations of Crick and Watson that each subunit should have the same environment. Not withstanding the symmetry deviations that occur at the special 'portal' of tailed phages, the rest of the capsid appears to have essentially perfect symmetry, as witnessed by 'high' resolution cryo-EM phage reconstructions (Fokine et al., 2004;Jiang et al., 2003Jiang et al., , 2006Morais et al., 2005).
In the absence of a tail, the presence of a specialized vertex in icosahedral viruses would have been missed both in crystallographic and in EM studies as a result of imposing icosahedral symmetry. The orientation of individual virions in a crystal would probably not be influenced by small asymmetric features at one vertex and the special vertex of any one virus might be in any one of the 12 possible orientations. Hence, the X-ray data from such crystals would be unable to establish the presence of a unique vertex. However, cryo-EM requires the determination of the orientation for every individual particle to be incorporated into the reconstruction. Thus, if the special vertex could be identified by using a suitably sized molecule that can recognize the unique feature, then an asymmetric cryo-EM reconstruction would be able to visualize the breakdown of symmetry.
Although structural virologists have been accustomed to thinking of many viruses as having perfect icosahedral symmetry, the breakdown of icosahedral symmetry has been recorded for Mimvirus ( Fig. 1a; Xiao et al., 2005) and for Paramecium bursaria chlorella virus type 1 (PBCV-1; Nandhagopal et al., 2002;Yan et al., 2000;Fig. 1b). In the former, some suitably oriented particles can be seen to have a tail prior to infecting a host ( Fig. 1a; Xiao et al., 2005). In the latter, it has been shown that PBCV-1 (Fig. 1b) forms a transient tail prior to host infection (Van Etten et al., 1991), as is also the case for the lipid-containing PRD1 bacteriophage (Grahn et al., 2005).

Picornaviruses
Probably the structurally most studied group of viruses is the picornavirus family. This is both because these viruses were among the first viruses to be characterized and because of their small but highly symmetric shape, which makes them good candidates for crystallographic investigations (Acharya et al., 1989;Hogle et al., 1985;Luo et al., 1987;Rossmann et al., 1985). Their structure gave rise to the 'canyon hypothesis', which proposed that the cellular receptor would bind into a surface depression (the canyon), a site that is inaccessible to larger antibodies. Thus, the faster mutating surface amino acids would be able to escape the host's neutralizing antibodies while conserving the receptor-binding site in the canyon. This prediction was subsequently verified by a series of studies using cryo-EM of various picornaviruses complexed with soluble fragments of their cellular receptor molecules ( Fig. 2; Olson et al., 1993;Rossmann et al., 2002). Nevertheless, the rationale for the correct prediction has been questioned (Smith et al., 1996).
The picornavirus-receptor complexes were shown to retain the icosahedral symmetry of the virus, although the interaction of the virus with receptor molecules on cell surfaces is bound to be asymmetric. Indeed, the subsequent process of endocytosis is also likely to have asymmetric properties (Bubeck et al., 2005) in which the RNA molecule, with a covalently bound genomic protein (VPg), is released, possibly through a pore in the viral capsid.

Parvoviruses
Parvoviruses are small 260 Å diameter icosahedral nonenveloped single-stranded DNA viruses ( Fig. 3) with a genome of about 5 kbp. Each of the 60 structurally identical 64 kDa subunits consists of an eight-stranded antiparallel -barrel motif (Tsao et al., 1991) found in numerous viral capsid structures (Benson et al., 2004;Nandhagopal et al., 2002;Rossmann & Johnson, 1989;Fig. 4). The -barrel has large insertions between -strands. These insertions form most of the capsid surface and create small protruding 'spikes' around the icosahedral threefold axes in canine parvovirus (CPV; Tsao et al., 1991) and feline panleukopenia virus (FPV; Agbandje et al., 1993). CPV, which emerged as a natural variant of FPV in 1978 (Parrish & Kawaoka, 2005), has 99%   Cryo-EM reconstruction of CPV at 21 Å resolution showing (a) a surfaceshaded representation (adapted from Chipman et al., 1996) and (b) a central section (adapted from Tsao et al., 1991). The virus is about 280 Å in diameter. These reconstructions were based on icosahedral symmetry averaging, as was also the determination of the CPV crystal structure.

Figure 4
Ribbon diagram showing the fold of one VP2 monomer of CPV. The central jelly roll is shown in red with the -strands BIDG and CHEF making the opposite sides of the -barrel. Residues 92, 299 and 323 have been identified as being involved in receptor binding (Hueffer, Govindasamy et al., 2003) and are marked with red circles. (Adapted from Tsao et al., 1991.) sequence identity with FPV . Both viruses bind to transferrin receptor (TfR; Hueffer, Parker et al., 2003) as a prelude to infecting their host cells. The viral surface in the vicinity of the spikes contains residues 93, 299 and 323, which are involved in host-range control, specific recognition of TfR and antibody binding (Chang et al., 1992;Hueffer, Govindasamy et al., 2003;Lawrence et al., 1999;Palermo et al., 2003).
Human TfR (Bennett et al., 2000;Lawrence et al., 1999), which has 79% amino-acid identity to feline TfR, is dimeric with a butterfly-like shape having a span of about 100 Å and a molecular weight of 140 kDa. Each monomer contains a carboxypeptidase-like domain, an apical domain and a helical domain. Mutagenesis of feline TfR and analysis of chimeras between the feline and canine TfR indicated that CPV and FPV bind to the apical domain, distal from the membranebinding region (Palermo et al., 2003).
A cryo-EM reconstruction, in which no symmetry was imposed, of CPV bound to the ectodomain of TfR showed that only a few or probably only one of the 60 icosahedrally equivalent TfR-binding sites was occupied by a receptor molecule, consistent with various biochemical assays (Hafenstein et al., 2006). The location of the unique TfR-binding site on the virus surface was found to be in good agreement with mutational data. The ability of various neutralizing antibodies to bind to essentially the same site (Hafenstein et al., 2006) without inhibiting binding to other symmetry-equivalent sites suggests that the asymmetry might have been in the virus prior to it binding TfR or Fab. The TfR apparently binds to a unique vertex not suitable for Fab binding, whereas the Fab molecules can bind to the other 'symmetry-equivalent' sites where TfR cannot bind. The alternative rationalization of the cryo-EM reconstruction would be that the initial binding of TfR induces a conformational change that inhibits further binding of TfR to 'symmetry-equivalent' sites. Rationalization of the symmetric binding of Fab fragments to sites that can also be occupied by TfR is however more difficult with a hypothesis that assumes the virus is initially perfectly symmetric.
The concept of a unique vertex for what were previously thought to be accurately icosahedral viruses is at first difficult to accept. Nevertheless, as mentioned above, many otherwise icosahedral viruses do have special vertices. This might be the result of the absence of symmetry in the genome or may perhaps arise from difficulty in adding the final one or two subunits to an otherwise perfectly symmetric virion. The asymmetry might provide an advantage by identifying the position of one end of the genome, marked in picornaviruses by the genomic viral protein, for orderly exit into the cytoplasm from the disintegrating capsid in an endosome.

Bacteriophage T4: the head capsid
If indeed some apparently icosahedral viruses have special vertices, then it would be relevant to examine the impact of special vertices on the structure and function of the well studied tailed phages. Bacteriophage T4 has a total length of about 2200 Å , with a prolate head about 1200 Å in length and 850 Å in width containing a genome of 172 kbp coding for about 300 genes. There are about 40 separate proteins in the assembled virion, many of them in multiple copies. The tail Cryo-EM reconstruction of the head capsid of bacteriophage T4, based on fivefold symmetry averaging. The major capsid protein (gp23, in blue) forms hexamers. The small outer capsid protein (soc, in white) binds between gp23 hexamers. The highly antigenic outer capsid protein (hoc, in yellow) binds at the center of gp23 hexamers. Pentamers of the special vertex protein gp24 (purple) are at the icosahedral vertices. The tail (green) is smeared as it has sixfold symmetry, not the fivefold symmetry used for averaging.  terminates with a 425 Å wide hexagonally shaped baseplate to which are attached six long and six short tail fibers. The long tail fibers recognize Escherichia coli and thereby initiate contraction of the sheath around the tail tube, forcing the tail tube through the center of the baseplate into the E. coli periplasmic space. The tail tube terminates in a pin-like structure (a trimer of gene product 5; gp5) surrounded by three lysozyme domains (each a segment of the three gp5 polypeptides) that digest the peptidoglycan cell wall, resulting in the injection of the T4 genome into the host bacterium. This machine is highly efficient, producing an infection almost every time a phage particle interacts with a recognizable E. coli cell. The success of T4 and other tailed phages depends on having a specialized vertex to which is attached the complex tail organelle. In contrast, viruses that depend on eukaryotic host cells often require 50 or even 100 virions per cell for successful infection. Thus, possibly, some icosahedral viruses that infect eukaryotic cells might increase their infection efficiency, even in the absence of a tail, by having a special vertex to aid genome delivery to the host.
The icosahedral ends of bacteriophage T4 heads have T = 13 quasi-symmetry and the cylindrical mid-section has Q = 20 quasi-symmetry (Fokine et al., 2004). The protein shell of the mature T4 capsid is formed by the major capsid protein gp23, the special vertex protein gp24, the highly antigenic outer capsid protein (hoc), the small outer capsid protein (soc) and the head-tail connector gp20. Cryo-EM reconstructions have identified the location of these proteins in the head (Fig. 5) (Fokine et al., 2004). The major capsid protein forms a hexa-  gonal array of hexamers. The soc proteins bind to the interface between gp23 hexamers. Of special interest is the vertex protein gp24. It makes pentamers which form 11 of the 12 capsid vertices. The 12th vertex is the unique portal vertex occupied by the gp20 dodecamer. The portal vertex also provides the binding site for the DNA-packaging machine consisting of gp17, gp16 and gp20 that functions to package the genomic DNA into the preassembled prohead. Once the DNA is packaged, the tail is attached to form a fully assembled infectious virus particle.
The polypeptide fold of gp24 (Fokine et al., 2004) is similar to the major capsid protein of the tailed phage HK97 (Wikoff et al., 2000). The sequence of gp24 is homologous to that of gp23 (about 21% identical amino acids), making it possible to build a homology model of gp23. Furthermore, the comparison with HK97 provided the basis of building gp24 pentamers and gp23 hexamers. These could in turn be fitted into the cryo-EM density to make a pseudo-atomic model for a substantial part of the head capsid. The similarity between the T4 and HK97 capsid architectures makes a strong case that both these virus capsids evolved from a common primordial phage head, as is also the case for the tailed phages P22 (Jiang et al., 2003), '29 (Morais et al., 2005) and epsilon 15 (Jiang et al., 2006).
The T4 phage head consists of hexagonally packed planes of gp23 hexamers. The fivefold icosahedral vertices occur at the intersection of these planes, requiring a special pentagonal vertex protein. Apparently, the primordial phage had only one type of protein, but with gene duplication the specialized protein evolved independently of the major capsid protein gp23. A reversion to the primordial phage is produced by 'bypass' mutations in which the absence of the gp24 gene is compensated by mutations in gp23 (Fokine et al., 2006;Fig. 6)

Bacteriophage T4: the tail
The T4 head is connected to the baseplate by the tail tube and surrounding sheath. The baseplate has a hexagonal shape in the mature virus but changes to a star shape after adsorption on a host cell (Crowther et al., 1977;Simon & Anderson, 1967). This very large conformational change of the baseplate is concomitant with sheath contraction, extension of the short tail fibers from underneath the baseplate, bending of the long tail fibers and eventually the ejection of the genomic DNA through the center of the tail tube ( Fig. 7; Leiman et al., 2004). The baseplate is assembled from the trimeric 22S hub consisting of gp5, gp27 and gp29 and six 15S wedges each consisting of multiple copies of gp6, gp7, gp8, gp10, gp11, gp25 and gp53. The cryo-EM structures of the hexagonal baseplate , the extended sheath (Kostyuchenko et al., 2005) and the star-shaped baseplate with the contracted sheath  have been determined, as have the crystal structures of many of the baseplate proteins (Fig. 8)   cryo-EM reconstruction of the hexagonal and star-shaped baseplates using quantitative computer-aided procedures (Chacó n & Wriggers, 2002;Rossmann et al., 2001;Fig. 9). The position and orientation of each known protein structure was uniquely established. Some of the uninterpreted density could be assigned to specific proteins whose structures have not yet been determined based on cross-linking, molecular-weight and other data . As the absolute hand of the individual crystal structures is known, they could each be used to independently determine the hand of the cryo-EM maps. Fortunately, each protein gave the same hand for the cryo-EM reconstructions ( Table 1). Comparison of the hexagonal and star-shaped baseplate structures showed that the proteins move as rigid bodies over each other as the baseplate alters its conformation.
Although the structure of the sheath protein, gp18, is still unknown, it was possible to determine the shape of the protein and to observe that this protein can be segmented into three domains whose relative positions change when the sheath contracts. The gp18 protein subunits form a six-start righthanded helix around the tail tube generated by 23 hexameric rings. In the extended tail, these rings are rotated by 17.2 relative to each neighboring ring, forming a sheath around the tail tube that is 925 Å in length and has a 240 Å diameter. However, the contracted sheath is only 420 Å in length, but is 330 Å in diameter, with a rotation of 32.9 between successive rings. Thus, each of the six helices that form the sheath makes a 378.4 rotation around the tube when extended or a 782 rotation when contracted (Fig. 10). That means while the baseplate is firmly attached to the E. coli surface, the tail tube, terminating with the gp5 -helical 'pin', rotates 345.4 (roughly one revolution) like a drill while being pushed through the baseplate across the E. coli periplasmic space by almost 505 Å . This process also causes the three lysozyme subunits to be pushed onto the peptidoglycan cell wall for digestion.

Conclusion
A combination of X-ray crystallography and cryo-EM singleparticle reconstructions has been used to investigate how different kinds of viruses infect their hosts. The presence of a unique vertex appears to be critically important to obtain Fit of the known crystal structures into the cryo-EM reconstruction of the hexagonally shaped (top left) and star-shaped (bottom right) base plates. Gp9 is shown in green, gp11 in blue, gp12 in purple and gp10 in yellow. The limits of gp7 (red) were determined from additional biochemical data.

Figure 10
One of the six helical strands of gp18 that form the T4 tail sheath in the extended (green) and contracted (brown) states. The hexagonally shaped baseplate, tail tube and collar of the extended tail are also shown (blue).
The extended sheath makes about one turn around the tail tube, whereas the contracted sheath makes about two turns, thus causing the tail tube and head to rotate while entering the periplasmic space of the E. coli host.
(Adapted from Kostyuchenko et al., 2005.) efficient infection both for tailed phages as well as some icosahedral viruses.
We wish to thank all those whose work is represented in the text and in the figures. The previously unpublished diagrams shown in Fig. 7 were drawn by Petr G. Leiman. We thank Cheryl A. Towell for help in the preparation of the manuscript. The work was supported by an NSF grant (MCB 9986266) to MGR, an HFSP grant (RGP 28/2002) to MGR, FA, and VVM and Wellcome Trust and HHMI grants to VVM.