five

Evaluation of large language model chatbot responses to psychotic prompts: numerical ratings of prompt-response pairs

收藏
DataCite Commons2026-01-29 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.x0k6djj00
下载链接
链接失效反馈
官方服务:
资源简介:
The large language models (LLM) "chatbot" product ChatGPT has accumulated 800 million weekly users since its 2022 launch. In 2025, several media outlets reported on individuals in whom apparent psychotic symptoms emerged or worsened in the context of using ChatGPT. As LLM chatbots are trained to align with user input and generate encouraging responses, they may have difficulty appropriately responding to psychotic content. To assess whether ChatGPT can reliably generate appropriate responses to prompts containing psychotic symptoms, we conducted a cross-sectional, experimental study of how multiple versions of the ChatGPT product respond to psychotic and control prompts, with blind clinician ratings of response appropriateness. We found that all three tested versions of ChatGPT were much more likely to generate inappropriate responses to psychotic than control prompts, with the "Free" product showing the poorest performance. In an exploratory analysis, prompts reflecting grandiosity or disorganized communication were more likely to elicit inappropriate responses than those reflecting delusions.
提供机构:
Dryad
创建时间:
2025-11-19
二维码
社区交流群
二维码
科研交流群
商业服务