BixBench
收藏魔搭社区2025-12-05 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/futurehouse/BixBench
下载链接
链接失效反馈官方服务:
资源简介:
# BixBench Dataset
Contains the dataset file `BixBench.jsonl` and each corresponding data capsule as a `.zip` file. Capsules are named `CapsuleFolder-{uuid}.zip`
IMPORTANT UPDATE 2025/09/23:
In ongoing work, we found that many questions in the original dataset had insufficient detail to be answerable, especially in the preferred open-answer setting. To address this, we've extensively re-reviewed and revised a substantial portion of the benchmark. We have also updated the format of the dataset by flattening it to one question per row. This is a significant change and you should expect changes in performance (generally improved.) We will be updating our preprint soon with more details and our own updated results, as well as updating the public evaluation harness on GitHub shortly. We've left the original question set accessible under the `v1.0` tag. The current updated set is at `main` and `v1.5`.
# BixBench 数据集
本数据集包含`BixBench.jsonl`数据集文件,以及各对应数据封装包(均为`.zip`格式)。封装包命名格式为`CapsuleFolder-{uuid}.zip`。
2025年9月23日重要更新:
在后续研究工作中,我们发现原始数据集内的大量问题细节不足,无法得到有效解答,在预设的开放式问答场景中尤为突出。为解决该问题,我们对该基准测试集的相当一部分内容进行了全面复审与修订。同时我们还更新了数据集格式,将其扁平化调整为每行仅包含一个问题。此次调整改动幅度较大,模型性能或出现变化(整体预计有所提升)。我们将尽快更新预印本以补充更多细节与最新测试结果,并将于近期在GitHub平台更新公开评测工具包。原始问题集仍保留在`v1.0`标签下,当前更新后的数据集可通过`main`与`v1.5`分支获取。
提供机构:
maas
创建时间:
2025-06-11



