AdversaLLC/MentalBench
收藏Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/AdversaLLC/MentalBench
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- question-answering
language:
- en
tags:
- medical
pretty_name: MentalBench
size_categories:
- 10K<n<100K
---
# MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models
## 🌟 Overview
**MentalBench** is a comprehensive benchmark for evaluating the psychiatric diagnostic capabilities of large language models (LLMs). As the use of LLMs in healthcare expands, ensuring their reliability in sensitive domains such as psychiatry is crucial. MentalBench provides a robust evaluation framework, grounded in real-world psychiatric knowledge. To facilitate deeper reasoning and grounded evaluation, this benchmark is integrated with MentalKG, a specialized knowledge graph structured for psychiatric domain knowledge.
## 🎯 Question Types
| Type | Description | Difficulty | Number of Samples |
|------|-------------|------------|-------------------|
| **Type 1** | Medical Chart → Single Answer | Low | 1,725 |
| **Type 2** | Patient Self-Report → Single Answer | Medium | 3,450 |
| **Type 3** | Ambiguous Type → Multiple Answer | High | 6,525 |
| **Type 4** | Clear Type → Single Answer | High | 13,050 |
## 📝 Citation
If you find MentalBench and MentalKG useful for your research, please cite our paper:
```bibtex
@article{song2026mentalbench,
title={MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models},
author={Song, Hoyun and Kang, Migyeong and Shin, Jisu and Kim, Jihyun and Park, Chanbi and Yoo, Hangyeol and An, Jihyun and Oh, Alice and Han, Jinyoung and Lim, KyungTae},
journal={arXiv preprint arXiv:2602.12871},
year={2026}
}
```
任务类别:
- 问答
语言:
- 英语
标签:
- 医学
展示名称:MentalBench
样本规模:
- 10K<n<100K
---
# MentalBench:大语言模型精神科诊断能力评估基准
## 🌟 概述
**MentalBench** 是一款用于评估大语言模型(Large Language Model,LLM)精神科诊断能力的综合性基准测试集。随着大语言模型在医疗领域的应用不断拓展,确保其在精神医学这类敏感领域的可靠性至关重要。MentalBench 提供了一套基于真实世界精神医学知识构建的稳健评估框架。为促进深度推理与基于事实的评估,本基准测试集集成了专为精神医学领域知识结构化设计的专业知识图谱 MentalKG。
## 🎯 问题类型
| 类型 | 描述 | 难度 | 样本数量 |
|------|------|------|----------|
| **类型1** | 医疗病历 → 单项答案 | 低 | 1,725 |
| **类型2** | 患者自述 → 单项答案 | 中 | 3,450 |
| **类型3** | 歧义型问题 → 多项答案 | 高 | 6,525 |
| **类型4** | 明确型问题 → 单项答案 | 高 | 13,050 |
## 📝 引用
若您的研究中用到了 MentalBench 与 MentalKG,请引用我们的论文:
bibtex
@article{song2026mentalbench,
title={MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models},
author={Song, Hoyun and Kang, Migyeong and Shin, Jisu and Kim, Jihyun and Park, Chanbi and Yoo, Hangyeol and An, Jihyun and Oh, Alice and Han, Jinyoung and Lim, KyungTae},
journal={arXiv preprint arXiv:2602.12871},
year={2026}
}
提供机构:
AdversaLLC



