Who is biomedical data mining for?

I’ve been thinking lately about the purpose of mining biomedical data. At the moment, my group is actively building ways to transform biomedical and clinical documents into structured data, but that’s largely a technical process. There’s always the larger question of who the resulting tools should be for. Will they be used by clinicians? Will data science folks use them as part of their own workflows? Will epidemiologists find these tools useful for understanding disease outbreaks?

Mining remains a great metaphor because mines are both beautiful and horrific.

I’d love it if all or even one of these questions is eventually answered with an unquestionable “Yes, all the time, and their lives are better for it”. It’s thrilling to see anything that you’ve poured innumerable hours into become both useful and actively used. When it comes to broader impact of biomedical data analysis, though, I think there’s still an immense gap for one particular audience: patients.

It remains challenging for patients to obtain their own medical records, gain the contextual knowledge necessary to understand them, and to draw impactful conclusions from these documents. To be fair, the entire idea that a patient would be responsible for their own records is alien to most medical systems: most patients have not been to medical school and even clinicians have enough specialization to their knowledge that we can’t expect them to know everything. That being said, could a platform for extracting knowledge from biomedical literature AND a patient’s medical records lead to better health outcomes and quality of life? At the very least, could it help patients understand what’s going on with their health?

Crowdsourcing has been a popular solution so far. PatientsLikeMe is a very relevant player in this area. Interestingly, it was recently absorbed by UnitedHealth, the largest health system in the world. There was a site called CureTogether but it got acquired by 23andMe back in 2012. The UK-focused HealthUnlocked claims to have the largest membership of these types of platforms (at more than a million users, though they bill themselves as a social network so I imagine a large userbase is kind of the point) and is partially funded through sponsored surveys. There’s clearly interest in all that crowdsourced data, and I suspect it’s for at least one major reason: crowdsourced medical conclusions are effectively “digested” already, having started from a raw diet of literature, professional diagnoses, and personal experience. It seems like an ideal prescription for rare disease cases, particularly those where an online community can function as a support group. Could we automate part of that digestive process and allow patients to save some of the time required to pore over innumerable documents? Furthermore, could we apply such a process to diseases which aren’t strictly rare and/or hard to diagnose, but simply difficult for patients to understand?

I’ve heard, anecdotally, that the most active participants in biomedical crowdsourcing platforms tend to be retired nurses. This makes sense: they have both practical and general medical knowledge coupled with the time necessary to contribute to group efforts. Perhaps they would be an ideal audience for evaluating non-crowdsourced, automatically-generated knowledge collections.

Let’s keep track of every measurement, not just the obvious ones. How tall was your lunch, anyway?

Anyway, whoever figures out how to deliver structured biomedical knowledge to patients in such a way that they can directly apply the knowledge to their health will have some great opportunities. They may also get gobbled up by Epic or something. In theory, the data is already accessible: millions of papers are available through PubMed (well, PMC) even without a costly library-based subscription, while the majority of US health providers allow access to EHRs in some form. I’m admittedly suspicious of that second point, both for technical reasons (records aren’t equivalent in format across locations, nor across health systems) and practical ones (are the records complete or have they already been processed in some way? What steps do I have to take to get my records?). I tried to get my own records right now and ran into the following issues and points of confusion:

My insurance provider doesn’t provide access to my records - they refer me to a site run by Optum (hey, they’re part of UnitedHealth)
The portal appears to expect me to keep my own records, like an online notebook.
I could see my claims data…but only the titles (e.g. “ENCOUNTER FOR IMMUNIZATION”) and dates.
There are a bunch of other features, like a forum and activity tracker, but nearly all of them seem to involve smoking cessation, pregnancy, or weight. Potentially useful if that’s what you’re looking for, but I’m not, so they’re distractions.

So that turns out to be what’s called a Personal Health Record (PHR) rather than a provider-created EHR. Back to the insurance provider site.

They have claims information, but only from the previous two years, and I was repeatedly told that no records were available.
I have Explanation of Benefits forms but I’ll avoid an extended essay here about why EOBs are functionally useless to patients (much of it goes like “big numbers are scary, especially when you can’t tell if you’re going to be expected to pay them”).
One page suggested that my medical group may have my records, but didn’t clearly state this was the case.

I know I’ve received diagnostic results through an online platform before, so I went there and obtained those with no issue. Great! Unfortunately I’ve changed my medical group in the last year and don’t have a login for the new one yet. Once I do, my confidence that they will have access to my previous records (without having to make several phone calls, at bare minimum) is quite low.

(Of relevance to this situation: this essay by Mandl and Kohane about data standards in EHRs and the struggle to make them both accessible and useful. The authors identify their own SMART on FHIR API as part of the solution.)

I’m in good health and haven’t had to seek much medical attention, so perhaps this process becomes easier for those who require more care. It still looks like there’s major progress to be made in terms of joining biomedical knowledge with what patients already know.

J. Harry Caufield

J. Harry Caufield

severalog

J. Harry Caufield

Who is biomedical data mining for?

J. Harry Caufield