A few bits to chew on — J. Harry Caufield

This Small Things Considered entry about shipworm microbiomes. A version of this post was in the most recent Microbe. It's certainly relevant to the whole subject of industrial-scale cellulose metabolism.

A shipworm. It's quite simple, physiologically. From Wikimedia Commons.

This example of unexpected Unicode effects. Unicode is a wonderful, evolving thing, but its brute-force approach to providing every possible character can lead to dangerous redundancies. Is it really Unicode's fault, though? Or it is the result of EAFP programming philosophies?

Most bioinformatics databases I know of don't even parse Unicode strings properly, though there's usually a workaround.

Omictools.com - a database of bioinformatics databases. There are literal thousands of bioinformatics tools and resources for some applications. Even when there are only hundreds, it's difficult to tell which resource uses which kind of data, when it was last updated, or whether it's even being maintained anymore. A resource may have been released a year ago, but some funding-related disaster can quickly take it offline (or even worse, leave it in an undead state, still responding to users but providing malformed results). Omictools helps to identify the useful resources.

It's increasingly bothersome that these kinds of meta-databases are even necessary. I know that most scientists are spoiled for choice when it comes to existing data sets but when there's more -omics data out there than any human could conceivably process, we've missed the point. Why should we keep all this data around without using it?*

A 2013 paper by Duck et al. found that most (more than 70 percent, in their data set) bioinformatics resources don't get mentioned more than once in the literature. That's quite intimidating! Is it a sign that too many bioinformatics projects are like hammers searching for nails, or are we, as human researchers, just limited in how many different resources we can use at once?

*A disclaimer: I don't think we aren't using all that -omics data, I just think it's underutilized.

J. Harry Caufield

J. Harry Caufield

severalog

J. Harry Caufield

J. Harry Caufield