Satellite tobacco mosaic virus (STMV) is an icosahedral T=1 single-stranded RNA virus, with a genome of 1058 nucleotides.

The crystal structure of STMV was determined in the laboratory of Alex McPherson, revealing the structure of over 50% of the nucleotides. Because of the icosahedral averaging that is used in viral crystallography, the reported RNA structure has icosahedral symmetry, even though the sequence shows no such symmetry. It is therefore not possible to determine exactly which nucleotides are at each position seen in the structure. The crystal structure revealed 30 RNA double helices, each with nine base pairs and a dangling single-stranded nucleotide on each strand. Larson and McPherson proposed that the RNA is folded into a series of local stem-loops, with little or no long-range base pairing. They argued that this arrangement is consistent with the crystal structure, and that it would facilitate viral assembly.

In collaboration between McPherson's group and the group of Klaus Schulten, Freddolino et al. developed an all-atom model for the RNA genome and did a series of molecular dynamics simulations on it; this model used an artificial sequence, because of the problem mentioned in the previous paragraph.

In the summer of 2011, Susan Schroeder published a paper proposing a secondary structure for the STMV genomic RNA, based on the local stem-loop hypothesis described above, and on the results of chemical probing experiments that she carried out on the intact virus.

We have recently completed a second-generation all-atom model of the STMV genome. It is based on the Schroeder secondary structure described above.  Steve Larson and Alex McPherson have compared the electron density map derived from the model to the original crystallographic density model, finding a very high correlation.  (They are co-authors on the manuscript describing the model, which is under review at the Journal of Structural Biology as of June, 2012.)  To our knowledge, this is the first all-atom model for any virus, including every single nucleotide and every single amino acid. The model is shown below, with the protein capsid (grey) cut in half to reveal the RNA genome (magenta). The PDB file containing the coordinates of the model is available here.