Commentary

When Hallucinations Come For Medical Research

When people find they've used a fact or quote that AI hallucinated, the rational question is, shouldn’t they have been more careful? More on that later. For now, let me share the story of Max Topaz. 

Max is an associate professor at Columbia University’s School of Nursing. He’s the first to admit he uses AI.  He used it for the basic things, grammar and formatting to smooth out scientific papers. But after he shared his latest research, the journal responded with questions about a reference. Topaz found that the AI tool he used had put a fabricated source into his work surreptitiously. 

The irony here is that Topaz is an AI developer at Columbia.  “I felt deeply embarrassed,” Topaz told Fortune.  “I’m an AI researcher. I know about hallucinations. If this is happening to me, an AI expert, what happens to other people?”

advertisement

advertisement

That sent Topaz on a mission: How many other experts were having their critical research undermined by hallucinations? He set out to do the research. 

Topaz reviewed almost 2.5 million biomedical papers and 97 million citations using a database known as PubMed Central. The results were profoundly disturbing, turning up more than 4,000 fake references inside of nearly 3,000 papers. It’s impossible to know how many came from AI, as humans make things up, too. But the massive increase showed up after 2024 as AI became a core part of medical research. The results were published in the medical journal Lancet.

As Fortune reports: “Over the past three years, the rate of fabricated references in biomedical literature has grown more than 12-fold. In 2023, one in 2,828 papers contained at least one fake reference, a rate that had risen to one in 458 by last year. Over the first seven weeks of 2026, the researchers found, one in 277 papers had at least one nonexistent reference.”  “I’m thinking this is just the tip of the iceberg,” Topaz told Fortune.

The journal Nature reported that “The American Association for Cancer Research (AACR) found that 23% of abstracts in manuscripts and 5% of peer-review reports submitted to its journals in 2024 contained text that was probably generated by large language models (LLMs),” further confirming Topaz’ findings.  

Hallucinations are often more annoying than harmful, but having factual misinformation in medical research papers ends up undermining the basis of medicine itself. In my personal experience, a recent annual visit to my GP resulted in a medical transcript that was dramatically inaccurate. 

A recent report by the American Medical Association found over 80% of physicians now use AI professionally to summarize research and prepare clinical documentation, a share that has more than doubled since 2023.

AI hallucinations doesn’t just snag users who are making things up. The hallucinations are totally designed to look and feel real, with links, sources, and DOI numbers. The more critical the practice -- medicine or law -- the more the danger grows.

Scientific American reported that The Alabama Supreme Court sanctioned an attorney who had filed legal briefs that were riddled with hallucinations, including reference to cases that didn’t exist. 

A database at the Paris School of Advanced Business Studies catalogs more than 1,400 cases -- in just the past three years -- where AI errors were filed. “Courtroom proceedings are public, and lawyers face sanctions for false claims, making such errors comparatively easy to track,” reported Scientific American.  “Humans essentially have a tendency to believe that machines have more knowledge than they do, don’t break and are infallible,” Alan Wagner, an associate professor of aerospace engineering at Pennsylvania State University, told Scientific American

In the real world, AI danger competes with the media’s drive to cover AI’s power and value. And, no doubt, employers are pushing for AI adoption, even as more and more digital tools push AI results to the top of the stack.

So, where does this take us? Some call the unfolding issue science’s “reproducibility crisis,” with the age of AI compounded by a growing flood of hallucinated AI-generated content that now fills academic and medical literature. 

Perhaps the largest question is, do the platforms see hallucinations as a bug or a feature? While there is of course discussion of “solving” the problem, AI is quick to warn users not to trust its outputs, even as misinformation is presented with absolute confidence and certainty. Even if users instruct AI to only put verifiable content inside of quotation marks, the charming chatbots return made-up information with confident quotation marks. When asked, they apologize profusely. 

Will a platform decide that “truth” needs to be a core product deliverable? That doesn’t appear to be on the horizon.

1 comment about "When Hallucinations Come For Medical Research".
Check to receive email when comments are posted.
  1. Michael Tivon from maxico, June 3, 2026 at 8:45 a.m.

    Mr. Rosenbaum has raised an extremely important problem: AI hallucinations are not merely technical errors, but breaches in the trust systems on which media, science, business, and public life depend.

    The first emotional reaction to AI hallucinations is often rejection: perhaps the only safe response is not to use AI at all. That reaction is understandable. When AI fabricates a quote, a citation, a source, or a chain of evidence, it does not merely make a mistake. It damages trust.

    But refusal cannot be the long-term answer. AI is already entering journalism, publishing, marketing, science, medicine, education, law, and corporate communication. The practical question is no longer whether AI will be used. It is how to prevent AI-generated errors from contaminating the systems that depend on credibility.

    The better metaphor may be immunity.

    A healthy body does not survive by denying the existence of viruses. It survives through detection, response, memory, and defense. Information systems now need something similar: an immune layer that can identify fabricated references, unverifiable claims, false quotations, and weak evidence before they are published, cited, or amplified.

    That is why the next stage should not be anti-AI. It should be pro-verification.

    One important example is the work of Professor Maxim Topaz of Columbia University, whose recent research helped expose the scale of fabricated citations in medical literature. His team has developed Citadel, an AI-focused verification tool that can be understood as a kind of immune filter for texts, claims, citations, and evidence.

    This kind of work matters far beyond medicine. Media, publishing, marketing, public communication, education, and business will all need trustworthy verification layers as AI becomes part of everyday knowledge production.

    The opportunity here is not only academic or technical. It is also institutional and commercial. People with experience in media, communication, reputation, markets, and scale — including voices like Mr. Rosenbaum, who are already bringing public attention to the issue — could play a decisive role in promoting such verification systems and turning them into practical infrastructure.

    If this conversation reaches people who are interested in building that kind of AI-era immune system — publishers, media leaders, investors, platforms, agencies, or technology partners — this may be the moment to engage.

    The future is probably not “AI or no AI.” It is AI plus responsibility, AI plus verification, AI plus trust architecture.

Next story loading loading..