five

code_generation_lite

收藏
魔搭社区2026-05-16 更新2025-03-15 收录
下载链接:
https://modelscope.cn/datasets/livecodebench/code_generation_lite
下载链接
链接失效反馈
官方服务:
资源简介:
## LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code <p align="center"> <a href="https://livecodebench.github.io/">🏠 Home Page</a> • <a href="https://github.com/LiveCodeBench/LiveCodeBench">💻 GitHub Repository </a> • <a href="https://livecodebench.github.io/leaderboard.html">🏆 Leaderboard</a> • <a href="https://arxiv.org/abs/2403.07974">📄 Paper </a> </p> ![LiveCodeBench](images/lcb.png) ## Change Log Since LiveCodeBench is a continuously updated benchmark, we provide different versions of the dataset. Particularly, we provide the following versions of the dataset: - `release_v1`: The initial release of the dataset with problems released between May 2023 and Mar 2024 containing 400 problems. - `release_v2`: The updated release of the dataset with problems released between May 2023 and May 2024 containing 511 problems. - `release_v3`: The updated release of the dataset with problems released between May 2023 and Jul 2024 containing 612 problems. - `release_v4`: The updated release of the dataset with problems released between May 2023 and Sep 2024 containing 713 problems. - `release_v5`: The updated release of the dataset with problems released between May 2023 and Jan 2025 containing 880 problems. You can use the `version_tag` argument to load the desired version of the dataset. Additionally, you can use version tags like `v1`, `v2`, `v1_v3`, `v4_v5` to get the problems released in a specific version. ## Dataset Description LiveCodeBench is a "live" updating benchmark for holistically evaluating code related capabilities of LLMs. Particularly, it evaluates LLMs across a range of capabilties including code generation, self-repair, test output prediction, and code execution. This is the code generation scenario of LiveCodeBench. It is also used for evaluating self-repair using test case feedback. LiveCodeBench problems are collected from competition programming websites with particular focus on maintaining problem quality, test case quality, and problem difficulty diversity. This scenario currently hosts over 500 problems from LeetCode, AtCoder, and Codeforces. Each problem instance consists of a problem description, input/output examples, and hidden test cases. Additionally, every problem is tagged with its difficulty level and release date, which allows measuring model performance across different time windows. The goal is to generate a correct and efficient solution for each problem instance. The initial code_generation dataset included a larger number of test cases which leads to a substantially large dataset size. This (lite) version has pruned and sampled tests while trying to ensure similar performances with the original dataset. Going forward, livecodebench will be using this lite version for code generation evaluations. ## Usage You can use the dataset by loading it from the Hugging Face datasets library. Additionally, the version tag "release_v1" is used to specify the (temporal) version of the dataset. "v1" corresponds to the initial release of the dataset and "release_v2" is the second version. ```python from datasets import load_dataset lcb_codegen = load_dataset("livecodebench/code_generation_lite", version_tag="release_v2") ```

# LiveCodeBench:面向代码类大语言模型(Large Language Model)的全维度无污染评估 <p align="center"> <a href="https://livecodebench.github.io/">🏠 主页</a> • <a href="https://github.com/LiveCodeBench/LiveCodeBench">💻 GitHub 仓库</a> • <a href="https://livecodebench.github.io/leaderboard.html">🏆 排行榜</a> • <a href="https://arxiv.org/abs/2403.07974">📄 论文</a> </p> ![LiveCodeBench](images/lcb.png) ## 更新日志 由于LiveCodeBench是持续更新的基准测试集,我们提供了多个版本的数据集。具体而言,本数据集包含以下版本: - `release_v1`:数据集初始版本,收录2023年5月至2024年3月发布的题目,共400道。 - `release_v2`:数据集更新版本,收录2023年5月至2024年5月发布的题目,共511道。 - `release_v3`:数据集更新版本,收录2023年5月至2024年7月发布的题目,共612道。 - `release_v4`:数据集更新版本,收录2023年5月至2024年9月发布的题目,共713道。 - `release_v5`:数据集更新版本,收录2023年5月至2025年1月发布的题目,共880道。 你可以通过`version_tag`参数加载所需版本的数据集。此外,你还可以使用`v1`、`v2`、`v1_v3`、`v4_v5`这类版本标签,获取特定时间段发布的题目。 ## 数据集说明 LiveCodeBench是一个用于全维度评估代码类大语言模型能力的“动态更新”基准测试集。具体而言,它可从代码生成、自我修复、测试输出预测以及代码执行等多个维度对大语言模型进行评估。本文为LiveCodeBench的代码生成场景子集,该子集同时可用于基于测试用例反馈的自我修复能力评估。 LiveCodeBench的题目均采集自程序竞赛类网站,我们特别注重维持题目质量、测试用例质量以及题目难度多样性。当前该场景下收录了来自LeetCode、AtCoder和Codeforces的超过500道题目。每道题目实例均包含题目描述、输入输出样例以及隐藏测试用例。此外,每道题目均标注了难度等级与发布日期,这使得我们可以评估模型在不同时间窗口下的性能表现。本任务的目标是为每道题目实例生成正确且高效的解决方案。 初始的code_generation数据集包含大量测试用例,导致数据集规模过大。本(精简版)数据集对测试用例进行了裁剪与采样,同时力求保持与原始数据集一致的性能评估结果。后续LiveCodeBench的代码生成评估将使用该精简版数据集。 ## 使用方法 你可以通过Hugging Face datasets库加载本数据集。此外,可使用版本标签“release_v1”指定数据集的(时间)版本。其中,“v1”对应数据集的初始发布版本,“release_v2”对应第二版数据集。 python from datasets import load_dataset lcb_codegen = load_dataset("livecodebench/code_generation_lite", version_tag="release_v2")
提供机构:
maas
创建时间:
2025-05-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作