ResearchKit - Tools, of course, can be the subtlest traps

If you were hanging around Twitter, and the tech blogosphere more generally, during and immediately after the Apple Spring Forward event yesterday you will have found out a few things:

  1. That watch looks neat, and everyone knows which one they want now
  2. The MacBook is somewhat desirable, and all Mac bloggers are realising they need a new laptop
  3. Apple is about to save the world and cure all ills thanks to ResearchKit

Better people than me can deal with 1 and 2, but for the record 42mm Stainless Steel with Milanese Loop and this MacBook Pro looks quite old suddenly.

But ResearchKit... This is something I am vaguely qualified to have an opinion on.

Disclosures time: I'm a doctor, I try, where possible, to be a savvy consumer of evidence, I have done some time as a research interviewer for a study into the genetics of alcohol dependence, and back when I was in training I had the dubious distinction of scoring very highly on critical appraisal exams.

Now, what you need to understand about doctors is that despite not being statisticians, and despite us all admitting that statistics are complex, somehow all doctors convince themselves they can read papers and identify flaws in them. Or as I tell my trainees "No paper ever published is good enough to withstand detailed attention".

ResearchKit is designed to solve a number of problems, the first and most obvious is that it's hard to do medical research, specifically it's hard to identify people who are happy to be experimented on. Then it's hard to do the experiments. Analysing the data, that's hard too. ResearchKit appears focussed on the "Hard to get people" problem. It gives you access to 700,000,000 iPhones worldwide, which is around 1% of the world's population. By any reasonable judgment that's a lot of phones.

Of course, I own 8 of them. So I'm not sure that it's a strict 1:1 phones:people ratio. Still even 0.125% of the world's population is a huge sample size.

Oh, wait, it's the population running iOS 8. Well, that's still significant.

Oh, wait, it's the population running iOS 8 who install the specific app for the research you're doing and who allow you access to the data you need. That's probably still far more than the few hundred to few thousand most studies get. I'm sure that's still wonderful.

But, by giving researchers access to these millions of people Apple may not be being quite as helpful as they'd like. The general comments I saw online about this were "Medical researchers say it's hard to get people for research, this makes it easy". As far as it goes that's true. It is not, however, the whole story. We don't need just anyone for research, we need specific people, and specific people with specific problems.

Here's a good example. When my dad had bowel cancer (see just about every blog entry on this blog for details - seriously every story I have to share includes "when my dad had cancer") he took part in some genetic research, they took a sample of his blood. Then they asked my mum would she mind helping out too, she was of course delighted. How they explained it to her was that they needed a family member who wasn't a blood relative. What they meant was that they needed an age, social class and lifestyle matched control. Since my mum lived with my dad, ate the same meals and was married to him she was about his age, in his social class and had a broadly similar lifestyle. It was an elegant way to identify a control group.

You see this in research, at the start of a paper they list off the demographic details of their control and study groups and show that they are broadly similar for a range of demographic data. That is to say their case and control groups are matched.

When we do research we need control groups. If we're looking at things that cause a disease then we need people without the disease who are broadly similar to people who have it - except in the specific exposures we want to study. If we're looking to see if a drug works then we need a group of people who take an inactive pill to make sure that it isn't the placebo effect.

So, let's say we wanted to see if cigarettes cause lung cancer (Hint: They do. Stop smoking). We would want a group of smokers and a group of non-smokers. Then we would want as far as possible for the groups to match, we would want the sex and age mixes in the groups to be similar, and ideally other lifestyle exposures such as diet. If for instance we had a group of 20 something smokers and 60 something non-smokers then we mightn't find much of a difference, because 20 somethings who get lung cancer are thankfully rare.

In one way then ResearchKit is brilliant, assuming you can capture your data in an app - you have a huge source of control groups, and the groups will be so large it should be easy to match for age, sex, weight, even diet if you're pulling that in from HealthKit. I'm about to digress, but I will return to this point in a moment.

Here, we meet what I see as being the first potential pitfall with ResearchKit. You're assuming the data is good. Data entry is a skill. When I was doing the research interviews it was drummed into me "Fill the form in this way" "We need this to be keyed reliably and quickly" "Here are the meanings behind these questions and why we ask them this way". Accuracy matters, inter-rater reliability matters. If we are going to be pulling data on dietary exposures from HealthKit, well you'd better hope your study participants are entering what they're eating and doing so reliably and correctly and fully. Otherwise they're not just cheating on their diets, they're cheating on the research too.

So, lets assume that everyone who is motivated enough to sign up for medical research apps is also motivated enough to enter all their data correctly and fully. They may well be, after all they're motivated enough to sign up for medical research apps. This is actually problem number two for ResearchKit.

The people who are willing to participate in medical research using their phones are a self-selected population. They are interested in healthcare topics, they are using their phone to track their health, they cared enough about your particular research topic to search for the app. They're not the general population, they may have a vested interest in the illness you're researching, they are likely to be more health conscious than the average population. In short, they are likely to differ in certain ways from the norm, and so research using them may not be representative of the population as a whole.

Again, this is a problem that we tend to look for in research. We drum it into medical students. One question "Is this also likely to be true in my population?".

This brings back to a very subtle problem about cases and controls I alluded to above. It's a variation of an old story. A lot of research in psychology is done on university undergraduates. This means it's WEIRD research. Which is to say research into Western, Educated, Industrialized, Rich, and Democratic people. Their psychological biases and processes may vary systematically from people who aren't university undergraduates; without broader based research it's very hard to tell. Yet a lot of research is done on this population because it's so easy to get.

Likewise ResearchKit gives access to a lot of iPhone users, but it is possible, and indeed likely, that iPhone users as a whole will differ as a class from non-iPhone users. They are likely to be more affluent, more middle-class, and so on. It is likely that a lot of good demographic digging can be done to identify exactly how this population varies from the general population, and some statistical massaging can be done.

If you were to study lung cancer but manage to somehow build a study population with no smokers, you would miss the smoking gun entirely. It is possible that the population of iPhone users, whilst huge on a scale that can scarcely be imagined, is still systematically different from the population at large. For example iPhone owners, and especially Apple Watch owners, may well be more affluent. It may represent San Francisco very well, but do less well globally. Or it may look like a pretty good match, but have some subtle but important (and statistically significant) differences which could skew research findings.

Apple are building a massive potential resource, and it'll be a great resource for research into the illnesses that people who track their health using Apple products and who don't mind sharing that information with researchers. Care will however need to be taken to make sure that the results are generalisable.

The next problem I can foresee is identifying cases. When I was doing research interviews for the alcohol dependence genetics study we spent 90 minutes or more doing very detailed diagnostic interviews to identify if the study participants had alcohol dependence, substance dependence, depression, nicotine dependence and so on. Then we took their blood. The blood was the important part. Yet we still spent huge amounts of time interviewing. The reason, we needed to be very sure that they met the tight criteria being used to define what we were interested in. Sometimes diagnoses are wrong, and should confirm them in research. ResearchKit allows studies for people with pre-existing diagnoses, but who is confirming the diagnoses.

The apps that are out there actually do some great work working within the limits that ResearchKit has. It is likely to be most useful for large scale population studies, for recruiting huge and hopefully relatively broad cohorts of both cases and controls, and for looking at common illnesses. They are looking at things which your iPhone can measure, or which sensors connecting to it can measure. The asthma one also gives some basic advice, and that could probably be studied as a low grade intervention.

Despite my many paragraphs of cold water being poured up there, I am cautiously excited about ResearchKit, it gives access to massive numbers of people willing to share information and help with medical research, and a lot of the information is automatically gathered and so probably accurate. I just think that suggestions that it can fix all research overnight are a little optimistic, and it's worth considering that it may carry with it some traps around population bias etc.

ResearchKit is a powerful tool, and hopefully one which will enable larger scale, more accurate, better studies. But not all studies will be a good fit. In the end the most important thing will be the quality of the study design, not the bells and whistles. As long as the tool is used carefully and appropriately, it is likely to be useful.