ChatGPT Health: AI Triage Risks & Safety Concerns

by Grace Chen

The rise of artificial intelligence in healthcare promises faster diagnoses and more accessible care, but a new study reveals significant safety concerns surrounding AI-powered triage systems. Researchers found that OpenAI’s ChatGPT Health, launched in January 2026 and already used by millions, demonstrates a troubling pattern of errors, particularly when assessing urgent and non-urgent medical situations. The findings, published this month, underscore the demand for rigorous testing and validation before widespread adoption of these technologies.

The study, a “structured stress test” of ChatGPT Health’s triage recommendations, involved 60 clinician-authored patient scenarios spanning 21 clinical domains. Researchers presented the AI with 960 different cases, varying conditions to simulate real-world complexity. The results showed an “inverted U-shaped pattern” of performance, meaning the system struggled most with cases at both ends of the spectrum – those requiring immediate emergency care and those deemed non-urgent. This raises questions about the reliability of AI in accurately prioritizing patients and allocating resources.

AI Under-Triaged Over Half of Emergency Cases

Perhaps the most alarming finding is that ChatGPT Health under-triaged 52% of “gold-standard” emergency cases. This means the AI recommended a 24- to 48-hour evaluation for patients who, according to medical experts, required immediate attention in the emergency department. Specifically, the system misclassified cases of diabetic ketoacidosis and impending respiratory failure as less urgent, potentially delaying critical care. However, the AI correctly identified classical emergencies like stroke and anaphylaxis, demonstrating a degree of competence in certain scenarios.

The study also highlighted the impact of external factors on the AI’s assessments. When presented with scenarios where family or friends downplayed a patient’s symptoms – a phenomenon known as “anchoring bias” – the triage recommendations shifted significantly toward less urgent care. Researchers found an odds ratio of 11.7 (95% confidence interval 3.7-36.6), indicating a substantial influence on the AI’s judgment. This suggests that AI triage systems are vulnerable to the same cognitive biases that can affect human clinicians, and potentially amplify them.

Crisis Intervention Responses Were Inconsistent

The evaluation also revealed inconsistencies in ChatGPT Health’s response to patients expressing suicidal ideation. Crisis intervention messages were activated unpredictably, sometimes firing when patients described no specific method of self-harm and remaining silent when a clear plan was articulated. This erratic behavior raises serious concerns about the AI’s ability to provide appropriate support during mental health crises. The research, published in Nature, points to a critical need for refinement in how AI systems handle sensitive mental health issues.

Interestingly, the study found no significant effects related to patient race, gender, or barriers to care, although the researchers noted that the confidence intervals did not entirely rule out clinically meaningful differences. This suggests that, at least in this controlled test, the AI did not exhibit overt biases based on these demographic factors. However, the authors caution that further investigation is needed to fully assess potential disparities.

Mount Sinai researchers are continuing to evaluate updated versions of ChatGPT Health and other consumer-facing AI tools, with plans to expand future research into areas such as pediatric care, medication safety, and non-English-language use. The team’s findings emphasize the importance of ongoing monitoring and improvement as AI becomes increasingly integrated into healthcare.

The implications of these findings extend beyond ChatGPT Health. As more AI-powered triage tools enter the market, the need for independent evaluation and rigorous validation becomes paramount. The study’s authors emphasize that prospective validation is essential before these systems are deployed on a large scale. The potential for missed emergencies and inconsistent crisis safeguards demands a cautious and evidence-based approach to AI in healthcare.

Medical Xpress reported on the study, raising further questions about the safety of AI triage systems. Their coverage highlights the urgency of addressing these vulnerabilities to ensure patient safety.

Disclaimer: This article provides information for general knowledge and informational purposes only, and does not constitute medical advice. We see essential to consult with a qualified healthcare professional for any health concerns or before making any decisions related to your health or treatment.

The future of AI in healthcare hinges on addressing these critical safety concerns. Researchers will continue to refine these systems, and regulatory bodies will likely play a larger role in overseeing their deployment. The next step in evaluating ChatGPT Health will be to assess the impact of these findings on OpenAI’s development process and to monitor the performance of updated versions of the tool.

What are your thoughts on the use of AI in healthcare? Share your comments below, and please share this article with your network.

You may also like

Leave a Comment