five

DensingLaw-ScalingBench

收藏
魔搭社区2025-12-05 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/OpenBMB/DensingLaw-ScalingBench
下载链接
链接失效反馈
官方服务:
资源简介:
# DensingLaw-ScalingBench This dataset was created to enable a more accurate performance scaling law estimation of Large Language Models (LLMs). This dataset is released as part of our paper, **`Densing Law of LLMs`**. <!-- <div align="center"> English | [简体中文]() </div> --> <div align="center"> [📜 Paper](https://arxiv.org/pdf/2412.04315) <!-- | [💻 Github Repo]() --> </div> ## 💡 Overview This repository contains the open-source dataset used for calculating conditional loss in our LLM density evaluation framework. LLM density is defined as the ratio of effective parameter size to actual parameter size, where effective parameter size represents the number of parameters required for a reference model to achieve equivalent performance. This dataset enables researchers to replicate our density calculations and apply the methodology to their own models. ## Dataset Description The dataset contains test instances specifically designed for calculating conditional loss $\mathcal{L} = -\log(P(\text{answer} \mid \text{instruction}))$ on downstream tasks. Unlike traditional scaling law approaches that focus on whole-sequence language modeling loss, our dataset emphasizes the probability of output answers given input instructions. ### Data Format The dataset includes two main types of tasks: #### 1. Multiple-Choice Problems - **Input**: Concatenated problem statement and multiple options - **Output**: Analysis of the problem followed by the final answer label #### 2. Complex Reasoning Problems - **Input**: Problem statement requiring multi-step reasoning (e.g., mathematical questions) - **Output**: Complete reasoning steps followed by the correct answer ## Usage This dataset is designed to be used with the two-step estimation approach: 1. **Loss Estimation**: Fit the relationship between parameter size and conditional loss using the scaling law: ``` L = a * N^(-α) + b * D^(-β) ``` 2. **Performance Estimation**: Map conditional loss to downstream task performance using a sigmoid function: ``` S = c / (1 + e^(-γ(L-l))) + d ``` ## ⚠️ Disclaimer * The reasoning steps included in this dataset were automatically generated by **GPT-4o**. While we have made efforts to ensure their quality, we cannot guarantee that every reasoning process is entirely correct or flawless. * For any given problem, the solution provided by GPT-4o represents only one of many possible reasoning paths and should not be considered the sole "correct" method. * We encourage users to treat these reasoning steps as "soft" labels or references for evaluating a model's logical capabilities, rather than as absolute ground truth. ## 📜 License This dataset is released under the `Apache 2.0` license. ## 📚 Citation If you use this dataset in your research, please cite our paper: ```bibtex @misc{xiao2024densinglawllms, title={Densing Law of LLMs}, author={Chaojun Xiao and Jie Cai and Weilin Zhao and Guoyang Zeng and Biyuan Lin and Jie Zhou and Zhi Zheng and Xu Han and Zhiyuan Liu and Maosong Sun}, year={2024}, eprint={2412.04315}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2412.04315}, } ```

# DensingLaw-ScalingBench 本数据集旨在实现大语言模型(Large Language Model,LLM)性能缩放定律的更精准估算。本数据集随我们的论文**《大语言模型的密度缩放定律》**一同发布。 <div align="center"> [📜 论文](https://arxiv.org/pdf/2412.04315) </div> ## 💡 概述 本仓库包含用于我们LLM密度评估框架中条件损失计算的开源数据集。 LLM密度被定义为有效参数量与实际参数量的比值,其中有效参数量指参考模型达到同等性能所需的参数量。本数据集可帮助研究人员复现我们的密度计算流程,并将该方法应用于其自研模型。 ## 数据集说明 本数据集包含专为下游任务计算条件损失 $mathcal{L} = -log(P( ext{answer} mid ext{instruction}))$ 设计的测试样本。与传统缩放定律方法聚焦全序列语言建模损失不同,本数据集着重于给定输入指令时输出答案的概率。 ### 数据格式 本数据集包含两类主要任务: #### 1. 选择题 - **输入**:拼接后的题干与多个选项 - **输出**:问题分析过程与最终答案标签 #### 2. 复杂推理题 - **输入**:需多步推理的题干(例如数学题) - **输出**:完整推理步骤与正确答案 ## 使用方法 本数据集配合两步估算流程使用: 1. **损失估算**:通过缩放定律拟合参数量与条件损失间的关系: L = a * N^(-α) + b * D^(-β) 2. **性能估算**:通过Sigmoid函数将条件损失映射至下游任务性能: S = c / (1 + e^(-γ(L-l))) + d ## ⚠️ 免责声明 * 本数据集包含的推理步骤均由**GPT-4o**自动生成。尽管我们已尽力保障其质量,但无法保证每条推理过程均完全正确或无瑕疵。 * 针对任意给定问题,GPT-4o给出的解决方案仅为众多可能推理路径之一,不应被视为唯一的“正确”方法。 * 我们建议用户将这些推理步骤视为评估模型逻辑能力的“软标签”或参考依据,而非绝对的标准答案。 ## 📜 许可协议 本数据集采用`Apache 2.0`协议发布。 ## 📚 引用 若您在研究中使用本数据集,请引用我们的论文: bibtex @misc{xiao2024densinglawllms, title={Densing Law of LLMs}, author={Chaojun Xiao and Jie Cai and Weilin Zhao and Guoyang Zeng and Biyuan Lin and Jie Zhou and Zhi Zheng and Xu Han and Zhiyuan Liu and Maosong Sun}, year={2024}, eprint={2412.04315}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2412.04315}, }
提供机构:
maas
创建时间:
2025-08-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作