Unlabeled, but not forgotten. — J. Harry Caufield

Today I learned about Positive-Unlabeled learning, a type of semisupervised machine learning approach. This is the general problem: if you want a machine learning method to do binary classification, you need to start with examples of items which fit into one classification or the other. This is much easier and more efficient when you can safely say that everything in Column A is not in Column B and vice-versa. That isn't the case with some data. Rather, it's either labeled (Column A) or unlabeled (maybe Column B, or maybe Column A but just unlabeled).

PU learning can be used to define negative examples for protein function prediction. Citation below:
Youngs N, Penfold-Brown D, Bonneau R, Shasha D (2014) Negative Example Selection for Protein Function Prediction: The NoGO Database. PLoS Comput Biol 10(6): e1003644. doi:10.1371/journal.pcbi.1003644.

J. Harry Caufield

J. Harry Caufield

severalog

J. Harry Caufield

J. Harry Caufield