dia-bench/DIA-Bench
收藏Hugging Face2025-01-17 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/dia-bench/DIA-Bench
下载链接
链接失效反馈官方服务:
资源简介:
Dynamic Intelligence Assessment Dataset(DIA数据集)旨在通过动态生成的挑战来测试大型语言模型(LLMs)的问题解决能力。该数据集包含150个动态问题生成器,主要关注CTF(Capture the Flag)风格的挑战,涉及数学、密码学、网络安全和计算机科学等领域。数据集由行业专家手动开发,并通过多人测试以发现错误和边缘情况。数据集包含多个生成的测试实例,以提高测量的准确性。数据集的结构为JSON文件,每个问题实例包含多个字段,如挑战的描述、指令、解决方案等。数据集的创建动机是为了解决静态问答对可能被模型记忆或猜测的问题,引入了动态问题模板和改进的评估指标。
The Dynamic Intelligence Assessment Dataset (DIA) aims to test the problem-solving abilities of large language models (LLMs) through dynamically generated challenges that are difficult to guess. The dataset primarily focuses on CTF-style challenges, requiring knowledge from mathematics, cryptography, cybersecurity, and computer science. The dataset includes question and answer pairs, with various instances generated from the same test type to increase measurement accuracy. The dataset is curated by a team of industry experts and has been tested to ensure quality and accuracy. The README also discusses the evaluation process, dataset structure, fields, creation rationale, source data, and potential biases and limitations.
提供机构:
dia-bench



