Evaluation of large language model chatbot responses to psychotic prompts: numerical ratings of prompt-response pairs
收藏DataCite Commons2026-01-29 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.x0k6djj00
下载链接
链接失效反馈官方服务:
资源简介:
The large language models (LLM) "chatbot" product ChatGPT has
accumulated 800 million weekly users since its 2022 launch. In 2025,
several media outlets reported on individuals in whom apparent psychotic
symptoms emerged or worsened in the context of using ChatGPT. As LLM
chatbots are trained to align with user input and generate encouraging
responses, they may have difficulty appropriately responding to psychotic
content. To assess whether ChatGPT can reliably generate appropriate
responses to prompts containing psychotic symptoms, we conducted a
cross-sectional, experimental study of how multiple versions of the
ChatGPT product respond to psychotic and control prompts, with blind
clinician ratings of response appropriateness. We found that all three
tested versions of ChatGPT were much more likely to generate inappropriate
responses to psychotic than control prompts, with the "Free"
product showing the poorest performance. In an exploratory
analysis, prompts reflecting grandiosity or disorganized communication
were more likely to elicit inappropriate responses than those reflecting
delusions.
提供机构:
Dryad
创建时间:
2025-11-19



