fuyyckwhy/HS-Bench-results

Name: fuyyckwhy/HS-Bench-results
Creator: fuyyckwhy
Published: 2026-02-03 07:43:57
License: 暂无描述

Hugging Face2026-02-03 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/fuyyckwhy/HS-Bench-results

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - other size_categories: - 1M<n<10M language: - en pretty_name: HS-Bench Results tags: - human-subject simulation - benchmark - evaluation - PAS - ECS --- # HS-Bench Results Benchmark results for **HumanStudy-Bench (HS-Bench)**: evaluation outputs from running AI agents through reconstructed human-subject experiments. ## Dataset description This dataset contains: - **12 studies** (`study_001`–`study_012`): replicated experiments from published human-subject research (cognition, strategic interaction, social psychology). - **Multiple model × agent-design runs per study**: e.g. different LLMs (Mistral, GPT, Claude, Gemini, etc.) and presets (`v1-empty`, `v2-human`, `v3-human-plus-demo`, `v4-background`). - **Per-run artifacts**: - `full_benchmark.json` – full trial-level and aggregate results - `evaluation_results.json` – PAS/ECS and related metrics - `raw_responses.json` / `raw_responses.jsonl` – model outputs - `detailed_stats.csv` – detailed statistics ## Metrics - **PAS (Probability Alignment Score)**: Whether agents reach the same scientific conclusions as humans at the phenomenon level. - **ECS (Effect Consistency Score)**: How closely agents reproduce the magnitude and pattern of human behavioral effects. ## Structure ``` study_001/ <model>_<preset>/ full_benchmark.json evaluation_results.json raw_responses.json detailed_stats.csv ... study_002/ ... ... ``` ## Citation If you use this dataset or HumanStudy-Bench, please cite: ```bibtex @misc{liu2026humanstudybenchaiagentdesign, title={HumanStudy-Bench: Towards AI Agent Design for Participant Simulation}, author={Xuan Liu and Haoyang Shang and Zizhang Liu and Xinyan Liu and Yunze Xiao and Yiwen Tu and Haojian Jin}, year={2026}, eprint={2602.00685}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2602.00685}, } ``` **Paper**: [arXiv:2602.00685](https://arxiv.org/abs/2602.00685) ## Source Generated by [HumanStudy-Bench](https://github.com/XuanL17/HumanStudy-Bench/) (HS-Bench). Use this dataset to compare agent designs, reproduce results, or run further analysis.

提供机构：

fuyyckwhy

5,000+

优质数据集

54 个

任务类型

进入经典数据集