January 23, 2020

More wearable adversarial examples

January 23, 2020/ Harry Caufield

On the topic of one of my side interests, this paper from back in October details a strategy for adversarial attacks on object detectors in the real world. They make a clear distinction about the problem being faced here: evading an image classifier is one thing, since that could just be a matter of making the classifier think a face is a car, a cat, a tree, or etc. Creating a pattern capable of consistently evading an object detector is a different task, as in this case the classifier may be searching for a wide number of pre-defined object classes.

Long story short, the authors found a truly hideous but effective adversarial pattern. The lead figure is a great representation and its caption is perfect:

It’s worth reading, especially because some of the methods aren’t actually very technical (e.g., experiments with paper dolls).

It’s notable to me that three of the four authors are from Facebook AI. I know there’s a perception that the AI/ML labs at the big tech companies have extensive freedom to work on interesting technical challenges, but I’m still left wondering about how Facebook may use this knowledge. I’d presume they’d make their object recognizers and classifiers (and by extension, real-world human-scale classifiers) more robust. Perhaps it’s just nice that continued work on adversarial examples is published publicly.

Here’s that citation:

Wu, Z., Lim, S.-N., Davis, L. & Goldstein, T. Making an Invisibility Cloak: Real World Adversarial Attacks on Object Detectors. arXiv:1910.14667 [cs, math] (2019).

October 03, 2019

Who is biomedical data mining for?

October 03, 2019/ Harry Caufield

I’ve been thinking lately about the purpose of mining biomedical data. At the moment, my group is actively building ways to transform biomedical and clinical documents into structured data, but that’s largely a technical process. There’s always the larger question of who the resulting tools should be for. Will they be used by clinicians? Will data science folks use them as part of their own workflows? Will epidemiologists find these tools useful for understanding disease outbreaks?

Mining remains a great metaphor because mines are both beautiful and horrific.

I’d love it if all or even one of these questions is eventually answered with an unquestionable “Yes, all the time, and their lives are better for it”. It’s thrilling to see anything that you’ve poured innumerable hours into become both useful and actively used. When it comes to broader impact of biomedical data analysis, though, I think there’s still an immense gap for one particular audience: patients.

It remains challenging for patients to obtain their own medical records, gain the contextual knowledge necessary to understand them, and to draw impactful conclusions from these documents. To be fair, the entire idea that a patient would be responsible for their own records is alien to most medical systems: most patients have not been to medical school and even clinicians have enough specialization to their knowledge that we can’t expect them to know everything. That being said, could a platform for extracting knowledge from biomedical literature AND a patient’s medical records lead to better health outcomes and quality of life? At the very least, could it help patients understand what’s going on with their health?

Crowdsourcing has been a popular solution so far. PatientsLikeMe is a very relevant player in this area. Interestingly, it was recently absorbed by UnitedHealth, the largest health system in the world. There was a site called CureTogether but it got acquired by 23andMe back in 2012. The UK-focused HealthUnlocked claims to have the largest membership of these types of platforms (at more than a million users, though they bill themselves as a social network so I imagine a large userbase is kind of the point) and is partially funded through sponsored surveys. There’s clearly interest in all that crowdsourced data, and I suspect it’s for at least one major reason: crowdsourced medical conclusions are effectively “digested” already, having started from a raw diet of literature, professional diagnoses, and personal experience. It seems like an ideal prescription for rare disease cases, particularly those where an online community can function as a support group. Could we automate part of that digestive process and allow patients to save some of the time required to pore over innumerable documents? Furthermore, could we apply such a process to diseases which aren’t strictly rare and/or hard to diagnose, but simply difficult for patients to understand?

I’ve heard, anecdotally, that the most active participants in biomedical crowdsourcing platforms tend to be retired nurses. This makes sense: they have both practical and general medical knowledge coupled with the time necessary to contribute to group efforts. Perhaps they would be an ideal audience for evaluating non-crowdsourced, automatically-generated knowledge collections.

Let’s keep track of every measurement, not just the obvious ones. How tall was your lunch, anyway?

Anyway, whoever figures out how to deliver structured biomedical knowledge to patients in such a way that they can directly apply the knowledge to their health will have some great opportunities. They may also get gobbled up by Epic or something. In theory, the data is already accessible: millions of papers are available through PubMed (well, PMC) even without a costly library-based subscription, while the majority of US health providers allow access to EHRs in some form. I’m admittedly suspicious of that second point, both for technical reasons (records aren’t equivalent in format across locations, nor across health systems) and practical ones (are the records complete or have they already been processed in some way? What steps do I have to take to get my records?). I tried to get my own records right now and ran into the following issues and points of confusion:

My insurance provider doesn’t provide access to my records - they refer me to a site run by Optum (hey, they’re part of UnitedHealth)
The portal appears to expect me to keep my own records, like an online notebook.
I could see my claims data…but only the titles (e.g. “ENCOUNTER FOR IMMUNIZATION”) and dates.
There are a bunch of other features, like a forum and activity tracker, but nearly all of them seem to involve smoking cessation, pregnancy, or weight. Potentially useful if that’s what you’re looking for, but I’m not, so they’re distractions.

So that turns out to be what’s called a Personal Health Record (PHR) rather than a provider-created EHR. Back to the insurance provider site.

They have claims information, but only from the previous two years, and I was repeatedly told that no records were available.
I have Explanation of Benefits forms but I’ll avoid an extended essay here about why EOBs are functionally useless to patients (much of it goes like “big numbers are scary, especially when you can’t tell if you’re going to be expected to pay them”).
One page suggested that my medical group may have my records, but didn’t clearly state this was the case.

I know I’ve received diagnostic results through an online platform before, so I went there and obtained those with no issue. Great! Unfortunately I’ve changed my medical group in the last year and don’t have a login for the new one yet. Once I do, my confidence that they will have access to my previous records (without having to make several phone calls, at bare minimum) is quite low.

(Of relevance to this situation: this essay by Mandl and Kohane about data standards in EHRs and the struggle to make them both accessible and useful. The authors identify their own SMART on FHIR API as part of the solution.)

I’m in good health and haven’t had to seek much medical attention, so perhaps this process becomes easier for those who require more care. It still looks like there’s major progress to be made in terms of joining biomedical knowledge with what patients already know.

August 30, 2019

Draw your own lines

August 30, 2019/ Harry Caufield

Maps have always been interesting to me. It’s less about the aesthetics of maps or their level of detail (though those are interesting, and really require a staggering amount of coordinated effort) and more about how areas end up with consistent definitions. Where are the explicit and implicit borders between neighborhoods? How do I tell someone I live in a particular area? How does the language we use in describing geographic regions change depending on who we’re talking with?

I live in LA, a city rich in examples of these types of questions. It’s famously a patchwork of neighborhoods, independent cities, and other sociopolitical niches, some as small as a block or two. There’s often debate about where one neighborhood begins and another ends. Luckily, some map creators are willing to wade into that active debate.

This LAist article features a very detailed map by Eric Brightwell. You can find the map on its own here.

There was a previous, searchable version produced by the LA Times. I believe the first version is from 2009 and it’s undergone changes since then. It’s not bad, though then again, I didn’t grow up in LA.

For context - here’s a map of gentrification over the last few decades in the same area.

July 16, 2019

Your health data, rapidly disappearing into the distance on a hijacked stagecoach

July 16, 2019/ Harry Caufield

When you generate health data, where does it go? I mean, if you visit the doctor, and they collect all their requisite information in their electronic health records, where do the records go? Who may access them? Despite all the regulations in place regarding patient privacy, these questions aren’t easy to answer, especially in circumstances where data breaches may have left sensitive data open to access from unintended parties. This is the ground covered by theDataMap, a project by Prof. Latanya Sweeney and Harvard’s Data Privacy Lab.

The map is essentially an index of known data sharing arrangements between parties, irrespective of whether any single person or group may participate in those relationships. Most of its health data is from state-level discharge records, i.e., partially-structured records describing individual details of a patient and hospital visit, including payment details. While these records don’t include names or other personal identifiers, the project’s creators note that discharge records provide enough detail to link patients to news stories and thereby identify patients. (In theory, some could be linked to clinical case reports as well.) These records also don’t match HIPPA standards as they’re governed by state regulations instead.

So, the answer to “where does my health data go” is essentially “to whoever buys it or finds it after a data breach”. Click on any of the nodes on the project site and you’ll get a list of organizations known to handle health data, along with any instances of data going missing. I think this is the most interesting aspect of the project: with a more comprehensive graph representation and/or a simple API, theDataMap could be a way to automatically trace paths between known data leaks and specific patient groups. If a Florida real estate company suffers a data breach and is known to have purchased discharge records, the impacted parties (i.e., patients of Florida hospitals) should know ASAP. Then again, sometimes it can take nearly a decade for health data breaches to become public.

July 12, 2019

How to make straight lines and arrows in Powerpoint

July 12, 2019/ Harry Caufield

A very short entry here for an ever-present issue with Powerpoint.

When drawing lines or arrows in the software, and particularly when letting said lines or arrows snap into position, they aren’t quite aligned right. They’re visibly off-center. In short, they look terrible.

A slightly-off center arrow. Not the worst example.

This problem has existed for years. The solution is to ensure that either the width or height (for vertical or horizontal lines, respectively) is exactly zero. In Powerpoint, right-click the offending line and select “Size and Position”, then adjust the corresponding height or width value to be zero.

I don’t like posting about Powerpoint problems and solutions, but these issues and the software itself are still all too common in academic circles.

J. Harry Caufield

J. Harry Caufield

severalog

J. Harry Caufield

More wearable adversarial examples

Who is biomedical data mining for?

Draw your own lines

Your health data, rapidly disappearing into the distance on a hijacked stagecoach

How to make straight lines and arrows in Powerpoint

J. Harry Caufield