five

PaulR11/niah-realism

收藏
Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/PaulR11/niah-realism
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是ICLR 2026论文Training Alignment Auditors via Reinforcement Learning的数据发布部分,主要用于针在干草堆(NIAH)现实主义评估。它比较了审计模型生成的合成审计记录与真实WildChat对话的成对对比,通过Opus 4.6法官判断哪个更真实。数据集包含多个子目录和文件,如结果、日志和中间转换输出等。每个运行的JSON模式包含配置、胜率、准确率等信息。数据集仅供研究使用,不包含API密钥或PII。

This dataset is part of the data release for the ICLR 2026 paper Training Alignment Auditors via Reinforcement Learning. It contains full outputs of the Needle-in-a-Haystack realism evaluation (Cell G of the 4-bucket suite). The methodology compares a synthetic audit transcript pairwise against a real WildChat conversation, with an Opus 4.6 judge determining which is more realistic. The dataset includes various subdirectories and files such as results, logs, and intermediate conversions. Each runs JSON schema contains configuration, win rate, accuracy, and other metrics. The dataset is for research use only and contains no API keys or PII.
提供机构:
PaulR11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作