Send it to please.


123 Street Avenue, City Town, 99999

(123) 555-6789


You can set your address, phone number, email and site description in the settings tab.
Link to read me page with more information.


Notes on notes 3: it's a process

Harry Caufield

  • August 11, 2013

It's time for a science story, presented in convenient bullet-point format for easy digestion. Most of this story was first told during the course of my Master's research in Gail Christie's lab at Virginia Commonwealth U., so if you're hungry to know more, please read my Master's thesis (a phrase I don't get to use often, if ever), the corresponding paper by Wall et al., or a more recent paper on the subject by Wall et al. that extends well beyond the material I was concerned with. To be honest, most of my work at the time was a combination of fighting with protein expression conditions and trying to figure out why my bacterial cultures were spontaneously dying, so I'm just glad the story of L27 in the Firmicutes yielded some interesting findings in the long run. Anyway, here are those bullet points I promised:

  • Staphylococcus aureus is a bacterial species best known for causing nasty, antibiotic resistant infections (e.g., those involving MRSA or VRSA strains). It's frequently a benign resident of human skin and nasal passages, though depending on geography and other factors, you may or may not be colonized with it. Here, I'll just call it S. aureus, and I'm not referring to any specific pathogenic strain.
  • Bacteriophage 80alpha is a virus capable of infecting S. aureus. Here, I'll call it 80alpha.
  • 80alpha structural proteins get processed during viral assembly. Or, if we think of these viruses as Ikea furniture, this is one of those designs in which some of the parts have to be trimmed a bit to fit together properly. 
 A bunch of 80alpha virions, complete with capsid (or head) structures, tails, and baseplates (those are harder to see; the arrow indicates one seen head-on). Image from  Spilman et al. (2011)  Journal of Molecular Biology  .

A bunch of 80alpha virions, complete with capsid (or head) structures, tails, and baseplates (those are harder to see; the arrow indicates one seen head-on). Image from Spilman et al. (2011) Journal of Molecular Biology.


  • The 80alpha genome doesn't appear to code anything allowing the virus to make those essential cuts. That suggests that its host S. aureus cell provided the necessary enzyme (or, to continue the Ikea metaphor, you're assembling furniture in a friend's house and have to borrow their table saw. Or, to better represent the bacteriophage vs. bacterial host relationship, you steal it out of their garage). 
  • As another clue, the sequence of the 80alpha proteins being cut is a close match to a sequence in S. aureus ribosomal protein L27. So if there's an enzyme that 80alpha sneakily borrows for its own purposes, does S. aureus normally use that enzyme to process its own ribosomal protein L27? Spoiler: yes. This was not previously known to be a common phenomenon among bacterial ribosomal proteins.
  • L27 occupies a rather crucial location in the middle of the ribosome. Recall that ribosomes are the structures in cells responsible for assembling proteins out of amino acids and you'll realize how important it is to have functional ribosomes; ribosomes without L27 don't work terribly well, especially if that extra little bit on the end (the one with the similar sequence to the 80alpha protein, as noted above) doesn't get removed.
  • To make matters even stranger, the extra sequence on the end of L27 is present in S. aureus and many related bacterial species, but not in E. coli. This suggests there's a solid evolutionary reason why the extended protein has been kept around in some bacteria but not others. The "why" part remains unclear, but seeing as this L27 processing is essential to the survival of S. aureus and other potential bacterial pathogens, it may be a good basis for developing new antibiotic strategies.

The "eggnog" part here refers to eggNOG, my favorite database of orthologous groups, or clusters of genes with similar sequences. Similar sequence implies evolutionary relatedness, so resources like eggNOG provide a way to see how many branches of the tree of life contain sequences like the gene coding for ribosomal protein L27 or the enzyme responsible for processing it.

According to eggNOG, L27 is in the bacteria-specific orthologous group ENOG4105K46. The taxonomic profile of that group looks like this:


This doesn't tell us much other than that L27 is broadly conserved - that is, it's seen in a very wide variety of bacterial species. Cyanobacteria, actinobacteria, firmicutes, and proteobacteria are all distantly related but certainly share features like the general structure of their ribosomes. What this doesn't show us is which species have that extended form of L27, like in S. aureus. That's more of a job for a sequence alignment, and we can align the sequences of all 1,660 predicted gene products in this orthologous group. Clustal Omega will do most of the heavy lifting for this sizable alignment but won't distinguish sequences with that extension from others. Luckily we can just guess based on the consensus of the alignment, so after taking a look using Jalview, we see there does appear to be a contingent of sequences (somewhere around 430 out of the 1,660, in fact) with an extra bit on their N-terminal ends. That's the left side here.

L27 consensus.png


This doesn't answer the question of which species have the longer L27. We know from previous comparisons that species in the Firmicutes, like Staphylococcus species, generally seem to have the long form while E. coli and its relatives do not. Can we get more specific? Yes - if we trim the alignment down to just the first few amino acids of the N-terminus and remove gap-only sequences, we can then use a taxonomy-based phylogenetic tree builder to see how diverse this set of species is. I've previously used phyloT for this purpose but they recently switched to an unfortunate subscription-based funding model (though that's preferable to losing the site entirely, I suppose). Luckily the ETE Toolkit treeviewer can handle the same function, as long as we provide it with a newly-constructed Newick format tree. This is such a short sequence that it doesn't really provide the information content necessary for a fine-grained look at possible evolutionary relationships, but we can still compare it to the existing taxonomic groups.

The tree is large and doesn't entirely match up with bacterial taxonomy, as expected, but we do see that Streptococcus and members of the Veillonellaceae family, among others, seem to form some solid clades. Again, this sequence is so short that even small differences can really throw off how related we estimate them to be. The full tree is here though I'm not certain how stable that link is in the long term and it will take a while to load.

 Not terribly readable, I know, but that second green column to the right indicates species in Firmicutes.

Not terribly readable, I know, but that second green column to the right indicates species in Firmicutes.


We can also just cut everything down to the taxonomy IDs and use the NCBI conversion tool to get their human-readable names. The resulting list isn't much to look at, but it tells us that, in addition to S. aureus, 12 other species of Staphylococcus, 60 species/strains of Streptococcus, 33 species/strains of Lactobacillus, and even some more exotic Fusobacteria and Acetomicrobium species have genomes appearing to code for the extended L27. Many of these species are at home in extreme conditions (temperatures or pH, in particular) so we could guess that the extension relative to some species less tolerant of extremes confers a benefit to regulating protein assembly under those conditions. 

Conclusion: bacterial ribosomal proteins may be more diverse than we may have otherwise thought and may be involved in some neat regulatory functions.

Notes on notes 2

Harry Caufield

  • August 18, 2013

I suppose this was a product idea involving some visual interpretation of what would happen to rear windshield sticker families should they be impacted by physics. Enough time has passed to allow this category of car appliqué to evolve in patterns similar to those of Calvin urinating on car logos and of the awareness ribbon stickers, so the idea hasn't aged well. There are likely more fertile grounds for comedic innovation. 

Bumper stickers may be one of the lowest forms of humor anyway, with the exception of my eternal favorite, "IF YOU CAN READ THIS YOU'RE LITERATE".

Notes on notes

Harry Caufield

I like taking notes. Perhaps it's due to an unwillingness to forget anything potentially of value, perhaps it's for posterity (a personal archive, however informal, can make even small items and observations seem relevant and useful), or perhaps it's legitimately a good way to remember events as they happen.

I'm not here to justify why notes should be taken. I'm here to reflect on those I've already created. This will be a series of loosely-composed posts and will continue until I stop taking notes (or Google shuts Keep down, in a painful echo of good ol' Reader). I'm going to try to be as candid as possible. So let's get started:

  • August 1, 2013

OK, I can't promise any or all of these will make much sense. Here, imagine the usual Apple press conference: large screens, new features, gleaming products. The host of the event climbs on stage. He instructs the audience to consume the newly-announced Apple technology. The new phones have been carefully designed to fit neatly into the average American mouth.

Is the nucleus of a ham-handed satire or just something I found unnaturally amusing? It's safer to assume the latter.



So much to say: novel generation with birchcanoe

Harry Caufield

I'm trying something this month, partially for NaNoGenMo (essentially a computational, generative, and generally absurd version of National Novel Writing Month), and partially to address an idea that's been bouncing around in my head for a while now. I'm writing a program to assemble stories (perhaps more like loosely-connected prose, and slightly like novels) from existing collections of short, somehow homogeneous sentences. This will start with the Harvard Sentences - a set of carefully-engineered sentences originally intended for testing speech quality in telephone systems. They all read like this:

 1. Slide the box into that empty space.
 2. The plant grew large and green in the window.
 3. The beam dropped down on the workman's head.
 4. Pink clouds floated with the breeze.
 5. She danced like a swan, tall and graceful.
 6. The tube was blown and the tire flat and useless.
 7. It is late morning on the old wall clock.
 8. Let's all join as we sing the last chorus.
 9. The last switch cannot be turned off.
10. The fight will end in just six minutes.

There's a flat but poetic quality to all these sentences, like their edges have been sanded down, and most of that quality - unsurprisingly - only arises when they're spoken aloud. So, my code will need to generate stories with a similar quality.

The project is called birchcanoe, after the first of the Harvard Sentences in the list. There was a related magnetic poetry-style software toy a few years back, but I'm going for a larger scope here, though the exact extent remains to be seen. Updates below and on Github.

Nov 1 - Started the project. Not much code yet.

Nov 2 - Code is now at the point where it produces randomly-ordered sentences, so it's comparable to very early efforts into book generation. Still too basic to consider innovative.

Nov 8 - Got sick for a few days and that tends to fog the coding process. Still, I've elected to use seed sentences and retrieve new, similarly-structured text to populate each chapter rather than composing each entirely from the initial seeds. I've updated accordingly. Output is filler for now, like this:

She saw a cat in the neighbor's house. Saw neighbor's she neighbor's the saw saw she house. In house. Cat house. A in in cat the neighbor's the a cat the saw house. Cat the in cat a house. She a saw house. A cat house. The in cat cat the house. In the saw saw she saw in house. Cat neighbor's saw in cat the she a a neighbor's in a house. The in in in house. In house. The neighbor's saw house. House. The she house. Saw house. Cat house. She a house.

Dec 5 - Obviously I've passed the deadline on this one. That's how life goes sometimes. The project is not over, however: the plan is to record audio of my readings of each of the input sentences, use the DeepSpeech engine to build a text-to-speech model with the audio as training data, then use that model to produce audio books of the generated text. It will sound strange, and that's kind of the point (otherwise, I could use one of the many extant text-to-speech systems or their APIs).