Send it to please.


123 Street Avenue, City Town, 99999

(123) 555-6789


You can set your address, phone number, email and site description in the settings tab.
Link to read me page with more information.


Undiscovered ice floes

Harry Caufield

I've been reading some Don Swanson papers recently in an attempt to:

  1. Approach information retrieval (IR) from a philosophical perspective, as IR is a major part of what I do now but is far too broad a field to easily comprehend without years of experience
  2. Gain a historical perspective
  3. Remember material from my undergraduate Philosophy of Mind course (something about P-zombies...or I guess that was something else entirely)
Not this guy. No relation. Some relation?

Not this guy. No relation. Some relation?

My progress on those fronts continues, but in the meantime, I noticed an interesting point in the 1986 paper, "Undiscovered Public Knowledge":

To verify that all relevant pieces of recorded information do in fact fit the description specified by a given search function, one would have to examine directly every piece of information that has ever been published. Moreover, such a task would never end, for, during the time it would take to examine even a small fraction of the material, more information would have been created. The above-stated hypothesis about a search function, in short, can never be verified. In that sense, an information search is essentially incomplete, or, if it were complete, we could never know it. Information retrieval therefore is necessarily uncertain and forever open-ended.

OK, so that's less of a point and more of a critical element of searching information: we can never know everything because we can never search everything. Swanson is specifically discussing scientific literature here, but even if he wasn't, do we now have access to technology rendering that issue somewhat less of a concern? We can't search everything, but between heavily optimized database structures, carefully engineered indexing schemes, and deep learning approaches (though I'd rather avoid seeing any type of machine learning as a universal, hammers-and-nails solution) can't we get very close?

At the very least, focusing on scientific literature alone, the modern issue becomes less of how rapidly new information becomes available as much as how rapidly it is lost. I'd suspect that this is more of a problem for supplementary data than for manuscripts; data tables are much more difficult to index and are essentially useless without documentation, so every data set available only in a single supplementary Excel spreadsheet has the potential to be "lost" data. I'm curious about how much of this information disappears every day, like melting glaciers or permafrost, never to be seen again, except perhaps with luck or coincidence (for the scientific data, at least - that probably won't work for the ice).