Development, system design, safety, and performance metrics of a conversational agent for reducing depressive and anxious symptoms: The MHAI Study
收藏DataCite Commons2025-07-21 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/Development_system_design_safety_and_performance_metrics_of_a_conversational_agent_for_reducing_depressive_and_anxious_symptoms_The_MHAI_Study/29606618/1
下载链接
链接失效反馈官方服务:
资源简介:
<b>Background:</b> Conversational agents based on large language models (LLMs) have shown moderate efficacy in reducing depressive and anxiety symptoms. However, most existing evaluations lack methodological transparency, rely on closed-source models, and show limited standardization in performance and safety assessment.<b>Objective: </b>We have two study objectives: (1) to develop an LLM-based conversational agent through system design analysis and initial functionality testing, and (2) to evaluate its safety and performance through standardized assessment in controlled simulated interactions focused on depression and anxiety of two LLMs (GPT-4o and Llama 3.1-8B).<b>Methods: </b>We conducted a cross-sectional study in two phases. First, we developed a mental health platform integrating a conversational agent with functionalities including personalized context, pretrained therapeutic modules, self-assessment tools, and an emergency alert system. Second, we evaluated the agent’s responses in simulated interactions based on predefined user personas for each LLM. Four expert raters assessed 816 interaction pairs using a 5-criterion Likert scale evaluating tone, clarity, domain accuracy (correctness), robustness, completeness, boundaries, target language, and safety. In addition, we use quantitative performance metrics such as cost, response length, and number of tokens. Multiple linear regression models were used to compare LLM performance and assess metric interrelations.<b>Results: </b>First, we developed a web-based mental health platform using a user-centered design, structured into frontend, backend, and database layers. The system integrates therapeutic chat (GPT-4o and Llama 3.1-8B), psychological assessments (PHQ-9, GAD-7), CBT-based tasks, and an emergency alert system. The platform supports secure user authentication, data encryption, multilingual access, and session tracking. Second, GPT-4o outperformed Llama 3.1-8B in both quantitative and qualitative metrics, generating longer and more lexically diverse responses, using more tokens, and scoring higher in clarity, robustness, completeness, boundaries, and target language. However, it incurred higher costs, with no significant differences in tone, accuracy, or safety.<b>Conclusion: </b>Our study presents a conversational agent with multiple functionalities and shows that GPT-4o outperforms Llama 3.1-8B in performance, although at a higher cost. This platform could be used in future clinical trials or real-world implementation studies.
提供机构:
figshare
创建时间:
2025-07-21



