This study published yesterday by The Lancet is getting linked to all over the place with one takeaway: AI makes you dumb because as The Verge's headline says, "Some doctors got worse at detecting cancer after relying on AI". The response is clear, right? Using ChatGPT and other LLMs makes you dumb.

Well, one thing stood out to me in the summary:

Between Sept 8, 2021, and March 9, 2022, 1443 patients underwent

2021? ChatGPT hit the scene in late 2022, so it couldn't be that. What "AI" are they talking about?

Well, I had to know, so I dropped $35 to get access to the study itself with all the details, and of course, it's more complicated than it appears. I'm not permitted to redistribute the study itself, so I'll just quote some of the relevant things here.

First up, what "AI" are they talking about?

Colonoscopies were done with high-definition end­ oscopy processors (EVIS X1 CV 1500, Olympus Medical Systems, Tokyo, Japan) with compatible endoscopes (CF-H190L, CF-HQ190I, CF-HQ1100DI, PCF-H190I; Olympus Medical Systems, Tokyo). The AI system used was ENDO-AID CADe (OIP-1, Olympus Medical Systems, Tokyo).

This is the ENDO-AID CADe and while it does say it's powered by AI, it's not using LLMs, it's using something more like what we used to call "machine learning" and is tech much more like the tech in Apple/Google Photos that detects faces and objects than generative AI. If you take one thing away from this post, make it that this study was not about LLMs. Still, there can be some good learnings in here, so let's keep going.

In the observed period between Sept 8, 2021, and March 9, 2022, after excluding the 100 colonoscopies conducted after the introduction of AI and those done in participants who met prespecified exclusion criteria, a total of 2177 colonoscopies were conducted, including 1443 without AI use and 734 with AI. Our analysis focuses on the 1443 patients who underwent standard, non-AI assisted colonoscopies before (n=795) and after (n=648) AI introduction.

Basically, there were three similarly sized sets of data:

  1. Colonoscopies done before using an AI tool
  2. Colonoscopies done with the AI tool
  3. Colonoscopies done 3 months after AI tools were introduced, but not using AI this time

Here's the big finding:

ADR at standard, non-AI assisted colonoscopies decreased significantly from 28·4% (226 of 795) before AI exposure to 22·4% (145 of 648) after AI exposure, corresponding to an absolute difference of –6·0%

A higher ADR (adenoma detection rate) is better, and there's a clear drop in results in the group after AI was introduced into their workflow. But what about those other 734 colonoscopies that did use AI? They weren't a primary focus of this study, so that's relegated to their supplementary analysis near the end.

In the supplementary analysis of patients who did have AI-assisted colonoscopy, the ADR was 25·3% (186 of 734). The multivariable logistic regression analysis in all 2177 colonoscopies (including those that used AI) showed that AI exposure in those without AI-assisted colonoscopy (as shown in the main analysis), female sex of the patient, and patient age younger than 60 years were independent factors that were significantly associated with ADR, when adjusted for endoscopist as a random effect. Compared with colonoscopies before the introduction of AI, use of AI was not significantly associated with a change in ADR (OR 0·80 \[95% CI 0·63–1·02\]); appendix p 6).

Their finding is that there was a slight drop in performance when using the AI tool, although they caveat it as being notable but not technically enough to be statistically definitive.

Finally, here's the line all the AI haters are looking for:

Another challenge is to understand why the detrimental effect occurred. We assume that continuous exposure to decision support systems such as AI might lead to the natural human tendency to over-rely on their recommendations, leading to clinicians becoming less motivated, less focused, and less responsible when making cognitive decisions without AI assistance.

My take

First off, let me shout it again from the rooftops, this study was not about LLMs. Got it? Good.

That said, it is about using technology that tries to assist humans in doing challenging tasks and make it easier for them. This is but one study, and the replication crisis demands we look at the findings of single studies as data, not absolute proofs of the nature of the universe. Still, it's interesting data that very plainly shows in this case:

  1. Doctors had a baseline of detecting cancer in colonoscopies
  2. A tool was added to their workflow that slightly reduced their performance
  3. When the tool was taken away, their performance reduced even more

My takeaway is that I'd have buyer's remorse if I spent a lot of money on that ENDO-AID CADe thing, but my broader takeaway is similar to what I think about AI tools in general: they don't make you smarter, but they can make you faster.

Let's look at a different hypothetical. What's 137 x 46? Can you do it in your head? Do you think having a calculator with you at all times every moment of your life makes you worse at doing mental math since you don't have to do it unless you really want to? I know I'm not as good as I was in middle school when I had to do this without a calculator.

Or how about my camera-loving friends out there. Do you think you're better or worse at film photography today because you have a totally digital workflow and you probably have for 20 years? How about this: are your digital photos better than your film ones? Maybe? Yes? Tomato/tomato?

How about reading a physical map?

The list goes on, and I think it's typical of people to get less good a things once they have technology that helps them do it. I would be better at mental math if I needed to do it all the time, but I have a computer with me that can do it, so it's a muscle I can relax. Strip away the mysticism and ire around LLMs that we live in right now and we should recognize they're just another tool. Tools can be used for good or bad, they can be helpful or not helpful, and they can be useful in some situations but not others. I think it's fair to say LLMs are a significant tool, and therefore all of these things are heightened, but it's still a tool at the end of the day. If it helps us be better, even marginally, at something, then we're probably going to get less proficient in doing that thing without help from the tool.

I think the important thing is for us to use the tool to let us relax the muscles that we can afford to relax and not the ones we shouldn't. This is not an easy thing to answer, in fact I'd say the answer is as challenging to balance as it's ever been (members post with a similar thought). In the case of this ENDO-AID CADe, I'd say that if this data is widely applicable, it's not a good device and hospitals should spend their money on something else. Oh, and don't take this study that doesn't even talk about LLMs be your proof that LLMs are bad for people (it makes it look like you used AI to summarize it 😉).