Evaluation of large language model chatbot responses to psychotic prompts: numerical ratings of prompt-response pairs

Name: Evaluation of large language model chatbot responses to psychotic prompts: numerical ratings of prompt-response pairs
Creator: Dryad
Published: 2026-01-29 01:32:12
License: 暂无描述

DataCite Commons2026-01-29 更新2026-04-25 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.x0k6djj00

下载链接

链接失效反馈

官方服务：

资源简介：

The large language models (LLM) "chatbot" product ChatGPT has accumulated 800 million weekly users since its 2022 launch. In 2025, several media outlets reported on individuals in whom apparent psychotic symptoms emerged or worsened in the context of using ChatGPT. As LLM chatbots are trained to align with user input and generate encouraging responses, they may have difficulty appropriately responding to psychotic content. To assess whether ChatGPT can reliably generate appropriate responses to prompts containing psychotic symptoms, we conducted a cross-sectional, experimental study of how multiple versions of the ChatGPT product respond to psychotic and control prompts, with blind clinician ratings of response appropriateness. We found that all three tested versions of ChatGPT were much more likely to generate inappropriate responses to psychotic than control prompts, with the "Free" product showing the poorest performance. In an exploratory analysis, prompts reflecting grandiosity or disorganized communication were more likely to elicit inappropriate responses than those reflecting delusions.

提供机构：

Dryad

创建时间：

2025-11-19