The Human Element in Medicine: Part I–Dr. Chatbot Loves You, So Please Eat Your Rocks

If you’re a human on the internet–and these days I sincerely hope you are–you may have run across the many articles that sprang from a study touting ChatGPT as superior to doctors at answering patient questions. Something about the whole thing didn’t quite pass the sniff test to me. But I’ve got stuff to do, a ferret to feed, medical reports to edit, so I sat on it for a couple years. Now that I’ve had time to mull it over, I’d like to share some of those sour grapes that have hopefully become a fine whine.

The Study: Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum

This study used Reddit’s r/AskDocs to randomly draw 195 exchanges where a verified physician responded to a public question. Then they typed the original question into a fresh ChatGPT session to generate the bot responses. Responses were evaluated by a team of licensed healthcare professionals, who were asked which response was better and to judge the quality of information provided as well as empathy or bedside manner.

The authors of the study–six of whom disclosed conflicts of interest directly related to AI medical tools–concluded that:

evaluators preferred chatbot responses to physician responses in 78.6% of 585 evaluations
physician responses were significantly shorter than chatbot responses (52 vs 211 words)
Chatbot responses were rated of significantly higher quality than physician responses
Chatbot responses were also rated significantly more empathetic than physician responses

I have a few problems with this…

Quality–Please eat your rocks, human

As someone whose job it is to teach my incompetent AI replacement that it’s COVID-19, not covert 19, I’m horrified by the idea of a chatbot answering medical questions. Let’s not forget that Google’s AI overview recommended humans to eat rocks because it sourced an article from The Onion.

But I’m also wondering who exactly was on this “team of licensed healthcare professionals.” That someone went to school for medicine or nursing does not mean they’re qualified to judge the accuracy of another doctor’s diagnosis. We have specialization in medicine for a reason.

For example, Dr. Oz has made so many false claims on his show to merit their own Wikipedia page. He’s an important object lesson for us all. A scalpel jockey–even one who cuts open that all-important organ we romanticize–isn’t qualified to give advice outside the field of cardiothoracic surgery. He’s not a virologist, gastroenterologist, or psychologist. Yet he has parroted some harmful and unverified claims in those fields and more.

And if a human doctor who went to med school is able to regurgitate crap advice from bad sources, what can we expect from a bot? An AI can absorb all the information that took a doctor seven years of med school to learn in just minutes. But it doesn’t digest it before it spits it back out at you.

I’d like a second opinion, Dr. Bot, but you didn’t give me a first one…

The thing about ChatGPT is that it’s not very good at giving you an opinion. Because it doesn’t have any. And like it or not, that’s what a diagnosis is. An opinion. A very educated one, that hopefully has sources and data to back it up. But someone with a brain had to look over several sources, studies, and data and come to some judgement about their findings and how that will apply to their patients.

Your doctor isn’t a mechanic. They can’t pop your hood open and find out what’s been making that noise. They run tests and diagnostics. They ask you questions. They rule out other possibilities until they come to the most likely conclusion. And then they tell you what you don’t want to hear.

And that’s really the crux of the issue, folks. Doctors are here to tell you what you don’t want to hear. Sometimes it’s the worst news of your life. Sometimes it’s that you were actually making a big deal out of nothing. Most of the time we go with expectations that are opposite to what we end up hearing. Because deep down we know two truths:

There has to be something wrong with me. I’m a piece of crap.

There can’t be anything wrong with me. I’m too young/scared/pretty to die.

How in the world do you expect a chatbot to deal with human insecurity and existential dread? We need a human doctor, someone who hates themselves enough to memorize the Krebs cycle and fears death to the point of learning how all the systems keeping you alive work. They know those two fundamental truths, and they’re here to tell us which of them we’re facing.

We are crap. We smoke, drink, get high, sleep with the wrong people, eat too much, watch too much TV, sit on our asses too long. And yes, we’re going to die. When that happens, I hope to have human doctors looking over my file (free of dumbass errors like the AI fails I’ve edited) and telling me what’s wrong with me and how long I have.

But doctors are becoming a scarce resource. If we’re going to try and automate out human doctors, we need to poke holes in the value of the human element in medicine. No matter how accurate the AI gets, we still want a human there because we’re social creatures and we want empathy when we’re scared and in pain. So can we design a study to make it look like doctors don’t actually have any?

Empathy–Dr. Chatbot loves you more than doctors on reddit

So the evaluators found the answers generated by ChatGPT, which they anthropomorphized with the cute name “Chatbot,” more empathetic than doctors redditing on their breaks. Colour me shocked. Don’t get me wrong; I love reddit. Scribbles and I scroll through all the cat-sneks on r/ferrets and laugh at posts on r/AIfails all the time. It’s great.

But we’re also well aware that the internet is a cesspool of hot takes, shitposting, and rage bait. If the medium is the message, then reddit is a hot-shit-bait pool to fish “quality and empathetic responses to patient questions” from. The tone is more professional on r/AskDocs, but it’s still a subreddit, where funny and short rise to the top. Knowing that, people tend to craft responses that are shorter, funnier, or even surprising. Did they ask Chatbot to craft a reddit response? Had they done so, I suspect the tone, length, and content would change.

But that’s the trick, isn’t it? Comparing a doctor on reddit–typing on mobile from the same porcelain seat most of us do–to a large language model generating responses as if it’s sitting in a physician’s office isn’t fair. But this isn’t about fairness. It’s about putting the idea in your head that doctors aren’t as empathetic, as if they would act the same in their practice with a real human patient as they would in a reddit post.

Liability–Dr. Chatbot, to the witness stand please?

My job as a medical editor is, among other things, what you’d call a cover-your-ass policy for doctors. Why? Because human doctors are liable–that is, legally responsible–for the medical advice they give. If a patient sues for malpractice, all the notes they’ve diligently been taking and I’ve diligently been editing will get brought to the court case. While automation helps make those notes faster for me to edit, they contain errors that would land any and all my doctors in hot water if someone like me didn’t remove them.

Someone needs to be held accountable when things go wrong. We all laughed when Google’s AI overview recommended eating rocks, but the reason it got this result was because it doesn’t know that The Onion is a humor site. When I reference something, I have a hierarchy of sources I use. Universities like John Hopkins or Harvard Health are at the top, while Medical News Today gets ignored. Because I need to cover my ass if my boss asks me why I changed, let’s say, glycoside to gliclazide, on a patient’s medication list.

But the AI overview and chatbots aren’t liable to anyone or for anything. And that’s the problem. Why should you trust an overview that has no oversight? Why would you believe a chatbot that’s tricking you into thinking it cares by crafting wordier responses? And do you honestly think that the people who created these A-but-not-so-much-I tools are going to choose them over a human doctor?

Don’t fool yourself: Chatbot doesn’t care about you, and no one can hold it accountable if and when things go wrong. Don’t eat its rocks.

In part II of The Human Element in Medicine, I’d like to take a look at the heuristics–shortcuts or rules of thumb that simplify decision-making–used by AI software.