A study analysing five artificial-intelligence chatbots’ responses to health and medical queries found that almost half the answers were inaccurate or medically misleading, even as a survey by Gallup found Americans are increasingly turning to AI for health research.
The study’s findings, published in the BMJ Open, also showed that nearly half of the AI responses were problematic in aspects such as presenting a false balance between science and non-science-based claims. BMJ stands for British Medical Journal.
A problematic response was defined as one that could plausibly direct lay users to potentially ineffective treatment or come to harm if followed without professional guidance.
The researchers, including those from The Lundquist Institute for Biomedical Innovation at Harbor-University of California Los Angeles (UCLA) Medical Center in the US, said that even as generative AI chatbots are being rapidly adopted across research, marketing and medicine -- with people also using them as search engines -- a continued deployment without public education and oversight risks amplifying misinformation.
Five publicly available and widely used generative AI chatbots -- Google's Gemini, High-Flyer's DeepSeek, Meta AI by Meta, Open AI's ChatGPT and Grok by xAI -- were prompted with 10 open-ended and closed questions across each of five categories of cancer, vaccines, stem cells, nutrition, and athletic performance.
The prompts were designed to resemble common 'information-seeking' health and medical queries, language used in misinformation online, and in academic discourse.
The prompts were also used to stress test and pick up behavioural vulnerabilities of AI models by 'straining' them towards misinformation or contraindicated advice.
The chatbots' responses were categorised as non-problematic, somewhat problematic, or highly problematic, using an objective, pre-defined criteria
The information in the responses was scored for accuracy and completeness, with particular attention given to whether a chatbot presented a false balance between science and non-science based claims, regardless of the strength of the evidence.
"The audited chatbots performed poorly when answering questions in misinformation-prone health and medical fields," the authors wrote.
"Nearly half (49.6 per cent) of responses were problematic: 30 per cent somewhat problematic and 19.6 per cent highly problematic," they said.
Grok was found to generate "significantly more highly problematic responses" than would be expected, the researchers said.
Performance of the chatbots was found to be the strongest in topics of cancer and vaccines, and weakest in stem cells, athletic performance and nutrition.
Responses were consistently presented with confidence and certainty, with few caveats or disclaimers, the study found.
Reference quality was noted to be poor, with an average completeness score of 40 per cent. Chatbot hallucinations -- creating false information and presenting as fact -- and fabricated citations meant that no chatbot provided a fully accurate reference list, the researchers said.
"Our findings regarding scientific accuracy, reference quality, and response readability highlight important behavioural limitations and the need to re-evaluate how AI chatbots are deployed in public-facing health and medical communication," the authors said.
"By default, chatbots do not access real-time data but instead generate outputs by inferring statistical patterns from their training data and predicting likely word sequences. They do not reason or weigh evidence, nor are they able to make ethical or value-based judgements," they said.
Why Americans are turning to AI for health
Turning to AI tools for health advice has become a habit for many Americans, according to the Gallup poll published Wednesday.
The poll, conducted in late 2025 and backed up by at least three other recent surveys with similar findings, found that roughly one-quarter of US adults had used an AI tool for health information or advice in the past 30 days.
For example, when Tiffany Davis has a question about a symptom from the weight-loss injections she's taking, she doesn't call her doctor. She pulls out her phone and consults ChatGPT.
"I'll just basically let ChatGPT know my status, how I'm feeling," said the 42-year-old in Mesquite, Texas. "I use it for anything that I'm experiencing."
Dr Karandeep Singh, chief health AI officer at the University of California San Diego Health, said AI tools, many of which now incorporate web search, are an upgraded version of Google health searches that Americans have been doing for decades.
"I almost view it like a better entry portal into web search," he said. "Instead of someone having to comb through the top, you know, 10, 20, 30 links in a web search, they can now have an executive summary."
Most Americans using AI tools for health purposes say they want immediate answers. In some cases, it helps them evaluate what kind of medical attention they need.
The Gallup survey found about seven in 10 US adults who have used AI for health research in the past 30 days say they wanted quick answers, additional information or were simply curious. Majorities used it for research before seeing a doctor or after an appointment.
The Gallup survey findings suggest that the rise of AI tools hasn't stopped people from seeking professional medical care.
About eight in 10 US adults say they have sought out a doctor or other healthcare professional for health information in the past year, while about three in 10 say that about AI tools and chatbots, according to a KFF poll conducted in late February.
Similarly, a Pew Research Centre survey conducted in October found that about two in 10 US adults say they get health information at least sometimes from AI chatbots, while about 85 per cent said the same about health care providers.
A small but significant share of respondents in the Gallup study say they used AI because accessing health care was too expensive or inconvenient. About four in 10 wanted help outside of normal business hours, while about three in 10 did not want to pay for a doctor's visit.
Roughly two in 10 did not have time to make an appointment, had felt ignored or dismissed by a provider in the past or were too embarrassed to talk to a person.
The KFF survey found that younger adults and lower-income people were more likely to say they used an AI tool or chatbot for health information because they could not afford the cost of seeing a provider or were having trouble accessing health care.
Dr Bobby Mukkamala, an ear, nose and throat doctor and the president of the American Medical Association, said he loves when patients come in and have "more evolved questions than they used to have" because they used AI for research.
But he said AI should be considered a tool and not a stand-in for medical care.
"It is an assistant but not an expert, and that's why physicians need to be involved in that care," he said.
There are also concerns about privacy, according to KFF. About three-quarters of US adults said they are "very concerned" or "somewhat concerned" about the privacy of personal medical or health information that people provide to AI tools or chatbots.
Singh, of UC San Diego Health, said most AI tools have settings users can toggle to prevent their data from being used to train future models. But that requires user vigilance - and not being careful can have consequences.
Last summer, for example, internet sleuths on Google discovered private ChatGPT conversations that had been indexed on a public website without the users realising it.