CL-bench

Name: CL-bench
Creator: maas
Published: 2026-05-15 16:56:10
License: 暂无描述

魔搭社区2026-05-15 更新2026-02-07 收录

下载链接：

https://modelscope.cn/datasets/tencent-community/CL-bench

下载链接

链接失效反馈

官方服务：

资源简介：

# CL-bench: A Benchmark for Context Learning ## Dataset Description **CL-bench** is a benchmark for evaluating language models' context learning abilities. Resolving tasks in CL-bench requires models to learn from the provided context, ranging from new domain-specific knowledge, rule systems, and complex procedures to laws derived from empirical data, rather than only relying on pre-trained knowledge. ### Dataset Statistics - **Total Samples**: 1,899 tasks - **Format**: JSONL (one JSON object per line) - **Context Categories**: 4 main categories with 18 sub-categories - **Average Rubrics**: 63.2 per context - **Average Tasks**: 3.8 per context ### Leaderboard Visit [www.clbench.com](https://www.clbench.com) for the full leaderboard and latest results! ## Dataset Structure ### Data Fields Each sample in the dataset contains the following fields: | Field | Type | Description | |-------|------|-------------| | `messages` | list | Multi-turn conversation in OpenAI chat format | | `rubrics` | list | List of evaluation criteria (strings) | | `metadata` | dict | Contains `task_id`, `context_id`, `context_category`, `sub_category` | #### `messages` Field The `messages` field follows the standard OpenAI chat format: ```json [ {"role": "system", "content": "system prompt"}, {"role": "user", "content": "context and task"} ] ``` #### `rubrics` Field A list of strings, each describing a specific evaluation rubric. #### `metadata` Field ```json { "task_id": "unique identifier for task", "context_id": "unique identifier for context", "context_category": "Rule System Application", "sub_category": "Game Mechanics" } ``` - **task_id**: Unique identifier for the task - **context_id**: Unique identifier for the context - **context_category**: One of the 4 main categories - **sub_category**: Fine-grained classification (18 sub-categories total) ## Usage Please see our **GitHub repository**: [github.com/Tencent-Hunyuan/CL-bench](https://github.com/Tencent-Hunyuan/CL-bench) ## License CL-Bench is released under a **custom evaluation-only license**. Permission is hereby granted, free of charge, to any person obtaining a copy of this dataset and associated documentation files (the "Dataset"), to use, copy, modify, merge, publish, and distribute the Dataset **solely for the purposes of evaluation, testing, and benchmarking of models**. The Dataset (or any portion thereof) **must not** be used for training, fine-tuning, calibrating, distilling, adapting, or any form of parameter updating. Please refer to the LICENSE file for the full license text. ## Citation If you find our work useful, please cite it as follows: ```bibtex @misc{dou2026clbenchbenchmarkcontextlearning, title={CL-bench: A Benchmark for Context Learning}, author={Shihan Dou and Ming Zhang and Zhangyue Yin and Chenhao Huang and Yujiong Shen and Junzhe Wang and Jiayi Chen and Yuchen Ni and Junjie Ye and Cheng Zhang and Huaibing Xie and Jianglu Hu and Shaolei Wang and Weichao Wang and Yanling Xiao and Yiting Liu and Zenan Xu and Zhen Guo and Pluto Zhou and Tao Gui and Zuxuan Wu and Xipeng Qiu and Qi Zhang and Xuanjing Huang and Yu-Gang Jiang and Di Wang and Shunyu Yao}, year={2026}, eprint={2602.03587}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.03587}, } ```

# CL-bench：上下文学习基准测试 ## 数据集描述 **CL-bench** 是用于评估语言模型上下文学习（Context Learning）能力的基准测试集。要完成CL-bench中的任务，模型需从给定上下文进行学习，这些上下文涵盖全新的领域专属知识、规则体系、复杂流程，乃至从经验数据推导得出的定律，而非仅依赖预训练阶段习得的知识。 ### 数据集统计 - **总任务数**：1,899个任务 - **格式**：JSONL（每行一个JSON对象） - **上下文类别**：4个主类别，共18个子类别 - **平均评估准则数**：每个上下文对应63.2条评估准则 - **平均任务数**：每个上下文对应3.8个任务 ### 排行榜请访问 [www.clbench.com](https://www.clbench.com) 获取完整排行榜与最新结果！ ## 数据集结构 ### 数据字段数据集中的每个样本包含以下字段： | 字段 | 类型 | 描述 | |-------|------|-------------| | `messages` | list | 遵循OpenAI聊天格式（OpenAI Chat Format）的多轮对话 | | `rubrics` | list | 评估准则字符串列表 | | `metadata` | dict | 包含`task_id`、`context_id`、`context_category`与`sub_category` | #### `messages` 字段 `messages` 字段遵循标准OpenAI聊天格式： json [ {"role": "system", "content": "系统提示词"}, {"role": "user", "content": "上下文与任务"} ] #### `rubrics` 字段为字符串列表，每个元素对应一条具体的评估准则。 #### `metadata` 字段 json { "task_id": "任务的唯一标识符", "context_id": "上下文的唯一标识符", "context_category": "4个主类别之一", "sub_category": "细粒度分类（共18个子类别）" } - **task_id**：任务的唯一标识符 - **context_id**：上下文的唯一标识符 - **context_category**：4个主类别之一 - **sub_category**：细粒度分类（共18个子类别） ## 使用方式请参阅我们的**GitHub仓库**：[github.com/Tencent-Hunyuan/CL-bench](https://github.com/Tencent-Hunyuan/CL-bench) ## 许可协议 CL-bench 采用**自定义仅评估许可协议**发布。特此免费授予任何获得本数据集及相关文档文件（以下简称“数据集”）的个人使用、复制、修改、合并、发布和分发数据集的权限，**仅可用于模型的评估、测试与基准测试**。数据集（或其任何部分）**不得**用于训练、微调、校准、蒸馏、适配或任何形式的参数更新操作。完整许可协议文本请参阅 LICENSE 文件。 ## 引用如果您认为我们的工作对您有帮助，请按以下格式引用： bibtex @misc{dou2026clbenchbenchmarkcontextlearning, title={CL-bench: A Benchmark for Context Learning}, author={Shihan Dou and Ming Zhang and Zhangyue Yin and Chenhao Huang and Yujiong Shen and Junzhe Wang and Jiayi Chen and Yuchen Ni and Junjie Ye and Cheng Zhang and Huaibing Xie and Jianglu Hu and Shaolei Wang and Weichao Wang and Yanling Xiao and Yiting Liu and Zenan Xu and Zhen Guo and Pluto Zhou and Tao Gui and Zuxuan Wu and Xipeng Qiu and Qi Zhang and Xuanjing Huang and Yu-Gang Jiang and Di Wang and Shunyu Yao}, year={2026}, eprint={2602.03587}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.03587}, }

提供机构：

maas

创建时间：

2026-02-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集