The promise of artificial intelligence revolutionizing healthcare hit a sobering note this week with the release of a modern study detailing significant safety concerns surrounding OpenAI’s ChatGPT Health. Launched in January 2026, the AI-powered tool—used by an estimated 40 million US adults daily for health advice—demonstrated a troubling tendency to miss critical medical emergencies more than half the time, raising questions about the readiness of such technology for widespread consumer use. The findings, published in the February edition of Nature Medicine, underscore the need for rigorous testing and careful consideration of the potential risks before integrating AI into sensitive areas like medical triage.
Researchers at the Icahn School of Medicine at Mount Sinai created 60 realistic patient scenarios, ranging from minor ailments to life-threatening conditions. These scenarios were then evaluated by three independent physicians to establish a gold standard for appropriate care. The AI’s responses were then compared to these expert assessments, revealing a pattern of concerning inaccuracies. The study focused on assessing whether ChatGPT Health could accurately identify situations requiring immediate emergency department attention.
AI Struggles with Nuance in Emergency Situations
The study revealed that ChatGPT Health under-triaged 52% of ‘gold-standard emergencies,’ meaning it failed to recommend emergency care when it was medically necessary. Specifically, the AI directed patients presenting with conditions like diabetic ketoacidosis and impending respiratory failure to a 24–48 hour evaluation window, rather than immediate hospitalization. This misdirection could have dire consequences, potentially delaying critical treatment and endangering lives. Dr. Ashwin Ramaswamy, lead author of the study, explained that while the AI performed well in straightforward emergencies like stroke or severe allergic reactions, it “struggled in more nuanced situations where the danger is not immediately obvious, and those are often the cases where clinical judgment matters most.”
In a particularly alarming simulation, researchers found that in 84% of attempts, ChatGPT Health advised a woman experiencing symptoms of suffocation to schedule a future appointment—an appointment she likely wouldn’t survive to attend. This highlights the potential for the AI to provide dangerously inadequate advice in critical situations. However, the AI also exhibited a tendency to overreact in less urgent cases, incorrectly advising 64.8% of healthy individuals to seek immediate medical attention.
Concerns Extend to Mental Health Crisis Detection
Beyond physical health emergencies, the study also raised concerns about ChatGPT Health’s ability to accurately identify and respond to suicidal ideation. The AI is designed to direct users expressing thoughts of self-harm to a suicide crisis line, but researchers found the system’s responses were inconsistent and, at times, counterproductive. The system’s alerts were “inverted relative to clinical risk,” appearing more reliably in lower-risk scenarios than when users described specific plans for self-harm. Researchers noted that in real-life clinical settings, a detailed plan for self-harm is a sign of more immediate danger, not less.
The study also found that the AI’s triage recommendations were susceptible to bias. When family members or friends minimized a patient’s symptoms, the AI was more likely to suggest less urgent care. This underscores the importance of considering the influence of external factors on AI-driven medical assessments.
Expert Reaction and OpenAI’s Response
The findings have prompted concern among medical professionals. Alex Ruani, a doctoral researcher in health misinformation mitigation at University College London, described the results as “unbelievably dangerous,” warning that the “false sense of security” created by these systems could have fatal consequences. “If someone is told to wait 48 hours during an asthma attack or diabetic crisis, that reassurance could cost them their life,” she said.
OpenAI acknowledged the study but stated that it does not reflect how people typically use ChatGPT Health or how the product is designed to function in real-world health scenarios. The company did not provide specific details on how it plans to address the identified shortcomings.
The Future of AI in Healthcare: A Call for Critical Evaluation
Despite the concerns raised by this study, researchers emphasize that the findings do not necessarily warrant abandoning AI health tools altogether. Alvira Tyagi, a medical student and second author of the study, argued that “these systems are changing quickly, so part of our training now must consider learning how to understand their outputs critically, identify where they fall short, and use them in ways that protect patients.” The key, experts say, is to approach these tools with a healthy dose of skepticism and to recognize their limitations.
As AI continues to evolve and become increasingly integrated into healthcare, ongoing research and rigorous safety evaluations will be crucial to ensure that these technologies enhance, rather than compromise, patient care. The next step will be prospective validation of these findings in real-world settings, as highlighted in the Nature Medicine study.
Have thoughts on the role of AI in healthcare? Share your comments below, and please share this article with your network.
