Data in plates and innovations in breakfast

Here's a useful tool I found on reddit this morning - an R package called phenoScreen.

It's set up to make working with lab data from 96 or 384-well plates easier to work with and visualize. I'm not really sure how most people produced plate maps otherwise without spending hours reinventing wheels.

On that note, isn't there anything else we could start reinventing? The wheel has been around for at least six millenia. Perhaps we could start reinventing toasters.

Venn diagrams in R, or how to go around in circles

Did you know you can produce Venn diagrams in R?

Yeah, I wasn't surprised either. It's easy enough to assemble many other kinds of graphs and data visualizations in R, so why not Venn diagrams? 

I tend to feel ambivalent about Venns (see also: Euler diagrams). They have many of the same problems as pie charts: they abstract data to the point of near-meaninglessness, they're completely inappropriate for situations when some values are much smaller or larger than the rest, and they can magnify the importance of otherwise trivial details. That being said, they're still popular and can express simple conclusions easily. If all I want to say is "these groups share n components" then it's hard to do better than a Venn diagram without going into more detail.

So how can we assemble these crazy things? One option is venn() in gplots - see this vignette for some examples. It's described here.

We can use two groups:

venn(list("Set A"=1:10,"Set B"=0:5))

That's not terribly interesting, so here's four sets:

venn(list("Set A"=0:10,"Set B"=0:5,"Set C"=5:39,"Set D"=7:80))

venn() will take a data frame as long as the values are booleans, so you can turn a data frame like this

  Ab Cd Ef

Into this:

gplots provides just one of the available options. There's also the VennDiagram package and the venneuler package. The former of those packages offers extensive customization but doesn't handle the intersection counts itself. It may work well if the counts are already available. Here's an example:

venn.plot <- draw.triple.venn(
area1 = 40,
area2 = 33,
area3 = 70,
n12 = 10,
n23 = 10,
n13 = 7,
n123 = 3,
category = c("First", "Second", "Third"),
fill = c("blue", "red", "green"),
lty = "blank",
cex = 2,
cat.cex = 2,

The latter of those two packages, venneuler, makes the whole process so criminally easy that you could safely ignore gplots altogether. That is, unless you want the actual counts of each group plotted as above. That may be a job for post-R vector image editing but why do it by hand when you can automate it?


November 2015 update:

There's also the Vennerable package. It's described here and doesn't appear to be in CRAN but can be downloaded from R-Forge. Vennerable can handle all kinds of exotic n-group Venns, with a maximum n of nine.  

Vennerable depends on graph, a package available through Bioconductor.

I haven't used it yet but will likely do so soon. Example output will show up here (further update: Vennerable also needs RBGL from Bioconductor. I ran into some unresolvable dependency issues while testing Vennerable so it will have to wait until I really need a nine group diagram, but that may indicate larger problems.)

February 2016 update:

Vennerable is now on Github. You'll need the devtools package to install it that way. 

Vennerable has some neat features but lacks useful documentation. Here's a quick example.

Using the example data provided with the package, and cutting it down to just three groups:

VennRaw <- Venn(StemCell)
Venn3 <- VennRaw[, c("OCT4", "SOX2", "NANOG")]
plot(Venn3, doWeights = FALSE)

and that gives you the basic diagram.

If you use the full set of four groups, Vennerable defaults to overlapping rectangles, like this.

The type argument can be set to "circles", "squares", "triangles", or "AWFE" (that is, the kind of plot favored by British statistician AWF Edwards).


Give Vennerable a try if you don't mind figuring out all the other options on your own or waiting for me to post about it again.


A multitude of R plot examples

You don't have to do it all by hand anymore. Note: I don't work with this kind of data.

Here's what happens more often than it should when I sit down to make some figures:

  1. I start with R and remember seeing an example similar to what I'm trying to make
  2. Searching for the example leads me back to the helpful but limited ggplot2 docs or a Stackoverflow question
  3. I piece together what I need from what I've found, knowing all the while that a better example was out there somewhere

This kind of thing drives me crazy, so here's a short list of places where decent R graph examples can be found. Hopefully these sites help others find what they need. One caveat: just because R can spit out a particular figure doesn't mean that figure is appropriate or represents the data well. Your mileage may vary, etc.

  • R Graph Catalog. Intended to complement a guidebook, this set of examples covers a wide variety of presentations and audiences. It filters graph types based on whether they're recommended or not (but hey, you can still use the examples). All code is included right on the site and on Github.
  • Quick-R Graphs. This site covers the basics and includes some useful figures like a plotting symbol chart.  
  • Cookbook for R Graphs. Another book accompaniment.
  • Plotly R Library. Here's where things start to get exotic. Plotly isn't R specific but supposedly plays nice with ggplot2. It could be worth using for interactive charts.
  • Gvis cookbook. A ggplot2 alternative. It also allows for interactive graphs.
  • Wikibooks R Programming Graphics. A few more examples, including some in 3D (those should probably be avoided, honestly).
  • R-bloggers. Not a list of examples as much as a source for examples, especially those of the bleeding-edge kind.

If you're getting tired of finding example graphs, there's also GrapheR, a GUI for producing R graphics.

Edit: wanted to add a postscript about some hacky ways to make ggplot2 display patterns.

Edit 2: I found one more resource in the ZevRoss ggplot2 cheatsheet. It's primary a guide to themes in plots, one of the more fiddly aspects of plotting in R.

Another bit of R

R makes it easy to chop up and reassemble data frames, whether it's with the subset() function or with dplyr's filter() function. It isn't always obvious how to do it for a whole data frame. That's where apply() is useful.
This will filter a data frame such that all columns have at least one value less than zero in any row, for example:
newdata.df <- data.df[ ,apply(data.df, MARGIN = 2, function(x) any(x < 1))]

any() can be exhanged for all() to restrict the selection to columns in which all values are less than one.
The same code works for rows: just move the apply(...) in front of the comma, then change MARGIN to 1 instead of 2.