I’ve been reading and thinking about this paper by Gandhi and Lake on mutual exclusivity bias, or ME bias, lately, especially in terms of what it means for understanding biomedical text and other communications. ME bias is the tendency for an individual or a model, given a set of objects with known names along with an unknown name and novel object, to assign the new name to the new object. This bias works under the assumption that every object has one name. If that seems childlike, you’re right: this is one of the biases used by children when they’re learning language. They don’t often grasp the complexity of hierarchical relationships while they’re still learning, but if you show them a novel object, they’ll readily attach a newly provided name to it.
Gandhi and Lake were curious about whether neural networks (NNs) operate using the same bias. It would be convenient if they did, not only because it would allow them to learn relationships in a way mirroring that of humans, but because the data they may need to learn from if often replete with infrequently-occurring concepts. This is, in fact, a known limitation of NNs. They often encounter difficulties in assigning meaning to objects or sequences when few or zero training examples are available. The authors refer to recent work by Cohn-Gordon and Goodman demonstrating how machine translation models often produce ambiguity through many-to-one semantic relationships (i.e., two sentences in a given language may be translated to the same output sentence, even if those two sentences have different meanings) but implementing a model incorporating a bias resembling ME can preserve more of those direct, meaningful relationships.
Through experiments with synthetic data, the authors show that:
None of 400 different varieties of NN classification model demonstrate ME bias. In fact, they default to the opposite bias: “…trained models strongly predict that a novel input symbol will correspond to a known rather than unknown output symbol”.
This anti-ME bias holds regardless of the size of the training data.
The same appears to be true for sequence-to-sequence models: “The networks achieve a perfect score on the training set, but cannot extrapolate the one-to-one mappings to unseen symbols”.
This tendency may be true for machine learning models of other architectures and not NNs alone, as the authors concede. They extensively discuss how including ME bias may improve applications of machine translation and image classification, with the caveat that continuing the metaphor of human-style learning may be untenable in machine learning. As humans, we need mechanisms to learn about novel phenomena for our entire lives, so we remain open to the idea that a newly-encountered word or object may have a new meaning or name. Training machine learning models requires some degree of artificial limitation, however. It does provide a level of control over learning that few actively learning children will ever experience (and, on the subject of active learning, children receive constant feedback from parents, teachers, and their environment; it’s challenging to give any machine model that amount of careful human guidance).
So what’s the relevance to understanding biomedical text? One of the challenges in understanding any experimental or clinical document is its vocabulary. We can expect that some words in the document will be novel due to some combination of not encountering them before, learning them in a different context (and perhaps even one with a slightly different meaning, like how a myocardial infarction and a cerebral infarction are physiologically similar but certainly not identical, not least of which because they’re in different organs), or authorial creativity. Here’s a recent paper with a novel title: “Barbie-cueing weight perception”. As a reader, I can parse that pun on “barbecue”, and that’s not even technical terminology. What would, say, a biomedical named entity recognition model do with it? I don’t think ME bias can solve pun recognition, but could it assist with recognizing when a term is genuinely new and meaningful?
Results by Gandhi and Lake suggest that, at least for machine translation models, a novel output should be expected given a novel input. In entity recognition, it’s trivial to have this expectation, but perhaps not useful to assume that all novel words or phrases are unique entities. Typing is the real challenge, especially if there are numerous possible types. Should all newly encountered words get added to new types, then processed further in some manner? Perhaps this would make the most sense in a continuous learning scenario where types are aligned to a fixed ontology but there is some room for ambiguity. I’m not sure if it’s quite the same as ME bias to have a bias toward ambiguity, but it seems like half of the idea. There’s likely some of the idea of learning to learn involved. A model would need to have some ability to recognize contexts appropriate for assigning new or ambiguous relationships, much like how children learn about being prompted to connect a new object with a name.