Misinterpreting this Depression Study May Lead Doctors to Treat the Wrong People


Welcome to Impact Factor, your weekly dose
of commentary on a new medical study. I’m Dr. F. Perry Wilson. It’s a new year, and after a little holiday
break I’m back and, frankly, a bit cranky as I peruse the recently-published medical
literature, so I’m focusing today on a rather small study, but one that hits a pet peeve
of mine and so I’m going to channel my inner Andy Rooney here and gripe for a bit. Appearing in JAMA Network Open we have this
article with the compelling title “Use of Machine Learning for Predicting Escitalopram
Treatment Outcome From EEG Recordings in Adults with Depression”. I like to know what I’m getting into when
I read a title. And this title promises quite a bit. To me, it reads like researchers used an EEG
and some fancy machine-learning stuff to predict which patients with depression would benefit
from escitalopram treatment. That idea, using a machine learning model
to choose the best psychiatric treatment is holy grail-level personalized medicine stuff. See, when confronted with major depressive
disorder, docs often try medication after medication to see what sticks – anything
to lessen that trial-and-error approach would save tons of time, not to mention lives. But that is not what this study is about. Walk with me through the methods and you’ll
see what I mean. Researchers from British Columbia analyzed
EEG data from 122 adult patients with major depression who were initiated on escitalopram
therapy. As you know, an EEG outputs a ton of data
– multiple electrodes, thousands of measurements. This is actually an ideal place to use machine
learning tools to squeeze all that data into a single number and the authors do an exemplary
job of using a well-established machine learning algorithm called a support vector machine
to take those gobs of data and turn it into a prediction. But what exactly are they predicting? They are predicting whether the patient will
have remission of depression in 8 weeks. They are NOT predicting whether escitalopram
was good for the patient, and that difference is huge. This study had no control group. All 122 patients were treated with escitalopram. We therefore have no way to know if the machine
learning model identified individuals more likely to achieve remission regardless of
therapy (let’s remember that depression spontaneously remits in around 20% of cases)
or those who truly benefit from escitalopram. See, every patient with depression has four
potential destinies with regards to escitalopram. Some will have remission with or without the
drug. Some will never have remission regardless
of treatment. Some will ONLY experience remission if they
get the drug, and others, presumably would only NOT experience remission if they get
the drug. It’s really the last two categories we care
about in terms of deciding on treatment, but ironically the first two categories are the
easiest to predict – because in the end the biggest predictor of whether you get remission
from depression is NOT whether you get a drug, but how severe your depression is in the first
place. This is a huge difference in terms of a prediction
problem and one that can actually lead to patient harm. Let me give an example. Imagine we built a model predicting who is
least likely to have a heart attack among a population receiving simvastatin. Without a comparator group, we’d find that
individuals with lower LDL, more physical activity, and without diabetes would have
the best outcomes. If we then argue that these are the types
of people who should receive statins we’d be doing a huge disservice to the people with
more severe disease at baseline. Our model doesn’t tell us who should get
the drug, it only tells us who was better off in the first place. We need models that can target therapies to
the right patients, regardless of how sick they are at baseline, or else we’ll always
choose the least sick to get treatment. Sure, that will make the success rate of therapies
look awesome, but it’s not how I want to practice medicine. Ok back to escitalopram. What this paper shows us is that the authors
built a model, based on EEG data that shows who is likely to have remission of depression. You could in fact argue that the model has
nothing to do with escitalopram. The model may predict outcomes equally well
among patients on any anti-depressant, or even no anti-depressant at all. In other words, we’re no closer to the dream
of strapping an EEG on someone’s head and knowing what drug to give them than we were
before. But studies like this get reported inaccurately
ALL THE TIME, suggesting that we have some new tool in our personalized medicine toolbox. My biggest fear is that these models get commercialized
as some sort of “use this to decide who to treat” black box, which, as we now all
understand, is biased against those who are sicker at baseline, even if they would respond
well to therapy. The second sentence of the conclusion of this
paper reads: “Developed into a proper clinical application,
such a pipeline may provide a valuable treatment planning tool”. Not really – not unless you want to reserve
treatment for the least sick individuals. Could the researchers prove that their model
is not simply identifying less severe depression as opposed to escitalopram-response? Well, they could show how their model correlates
with baseline depression scores or other baseline factors – my bet is that we’d find that
mostly the model just identifies those with less severe depression at baseline – but
that data is not presented. And let’s remember, that although it’s
very cool to get data about how severe your depression is just from an EEG – I mean
that’s star trek-y and I love it – we have plenty of tools already available to
assess depression severity. So the next time we see a study, using machine
learning or otherwise, that claims to “predict response to therapy” – the very next question
we have to ask is “how do we know the model isn’t simply identifying less severe disease
at baseline”? Happy new year.

Leave a Reply

Your email address will not be published. Required fields are marked *