five

hysong/MentalBench

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/hysong/MentalBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - question-answering language: - en tags: - medical pretty_name: MentalBench size_categories: - 10K<n<100K --- # MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models ## 🌟 Overview **MentalBench** is a comprehensive benchmark for evaluating the psychiatric diagnostic capabilities of large language models (LLMs). As the use of LLMs in healthcare expands, ensuring their reliability in sensitive domains such as psychiatry is crucial. MentalBench provides a robust evaluation framework, grounded in real-world psychiatric knowledge. To facilitate deeper reasoning and grounded evaluation, this benchmark is integrated with MentalKG, a specialized knowledge graph structured for psychiatric domain knowledge. ## 🎯 Question Types | Type | Description | Difficulty | Number of Samples | |------|-------------|------------|-------------------| | **Type 1** | Medical Chart → Single Answer | Low | 1,725 | | **Type 2** | Patient Self-Report → Single Answer | Medium | 3,450 | | **Type 3** | Ambiguous Type → Multiple Answer | High | 6,525 | | **Type 4** | Clear Type → Single Answer | High | 13,050 | ## 📝 Citation If you find MentalBench and MentalKG useful for your research, please cite our paper: ```bibtex @article{song2026mentalbench, title={MentalBench: A Benchmark for Evaluating Psychiatric Diagnostic Capability of Large Language Models}, author={Song, Hoyun and Kang, Migyeong and Shin, Jisu and Kim, Jihyun and Park, Chanbi and Yoo, Hangyeol and An, Jihyun and Oh, Alice and Han, Jinyoung and Lim, KyungTae}, journal={arXiv preprint arXiv:2602.12871}, year={2026} } ```
提供机构:
hysong
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作