five

fuyyckwhy/HS-Bench-results

收藏
Hugging Face2026-02-03 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/fuyyckwhy/HS-Bench-results
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - other size_categories: - 1M<n<10M language: - en pretty_name: HS-Bench Results tags: - human-subject simulation - benchmark - evaluation - PAS - ECS --- # HS-Bench Results Benchmark results for **HumanStudy-Bench (HS-Bench)**: evaluation outputs from running AI agents through reconstructed human-subject experiments. ## Dataset description This dataset contains: - **12 studies** (`study_001`–`study_012`): replicated experiments from published human-subject research (cognition, strategic interaction, social psychology). - **Multiple model × agent-design runs per study**: e.g. different LLMs (Mistral, GPT, Claude, Gemini, etc.) and presets (`v1-empty`, `v2-human`, `v3-human-plus-demo`, `v4-background`). - **Per-run artifacts**: - `full_benchmark.json` – full trial-level and aggregate results - `evaluation_results.json` – PAS/ECS and related metrics - `raw_responses.json` / `raw_responses.jsonl` – model outputs - `detailed_stats.csv` – detailed statistics ## Metrics - **PAS (Probability Alignment Score)**: Whether agents reach the same scientific conclusions as humans at the phenomenon level. - **ECS (Effect Consistency Score)**: How closely agents reproduce the magnitude and pattern of human behavioral effects. ## Structure ``` study_001/ <model>_<preset>/ full_benchmark.json evaluation_results.json raw_responses.json detailed_stats.csv ... study_002/ ... ... ``` ## Citation If you use this dataset or HumanStudy-Bench, please cite: ```bibtex @misc{liu2026humanstudybenchaiagentdesign, title={HumanStudy-Bench: Towards AI Agent Design for Participant Simulation}, author={Xuan Liu and Haoyang Shang and Zizhang Liu and Xinyan Liu and Yunze Xiao and Yiwen Tu and Haojian Jin}, year={2026}, eprint={2602.00685}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2602.00685}, } ``` **Paper**: [arXiv:2602.00685](https://arxiv.org/abs/2602.00685) ## Source Generated by [HumanStudy-Bench](https://github.com/XuanL17/HumanStudy-Bench/) (HS-Bench). Use this dataset to compare agent designs, reproduce results, or run further analysis.
提供机构:
fuyyckwhy
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作