five

code_generation

收藏
魔搭社区2026-01-02 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/livecodebench/code_generation
下载链接
链接失效反馈
官方服务:
资源简介:
## LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code <p align="center"> <a href="https://livecodebench.github.io/">🏠 Home Page</a> • <a href="https://github.com/LiveCodeBench/LiveCodeBench">💻 GitHub Repository </a> • <a href="https://livecodebench.github.io/leaderboard.html">🏆 Leaderboard</a> • </p> ![LiveCodeBench](images/lcb.png) LiveCodeBench is a "live" updating benchmark for holistically evaluating code related capabilities of LLMs. Particularly, it evaluates LLMs across a range of capabilties including code generation, self-repair, test output prediction, and code execution. This is the code generation scenario of LiveCodeBench. It is also used for evaluating self-repair using test case feedback. LiveCodeBench problems are collected from competition programming websites with particular focus on maintaining problem quality, test case quality, and problem difficulty diversity. This scenario currently hosts 400 problems from LeetCode, AtCoder, and Codeforces. Each problem instance is consists of problem description, input/output examples, and hidden test cases (over 59 on average!). Additionally, every problem is tagged with its difficulty level and release date which allows measuring model performance across different time windows. The goal is to generate a correct and efficient solution for each problem instance.

# LiveCodeBench:面向代码类大语言模型(Large Language Model)的全维度无数据污染评估 <p align="center"> <a href="https://livecodebench.github.io/">🏠 主页</a> • <a href="https://github.com/LiveCodeBench/LiveCodeBench">💻 GitHub 仓库</a> • <a href="https://livecodebench.github.io/leaderboard.html">🏆 排行榜</a> • </p> ![LiveCodeBench](images/lcb.png) LiveCodeBench是一款动态更新的基准测试集,用于全维度评估大语言模型(Large Language Model)的代码相关能力。具体而言,它从代码生成、自修复、测试输出预测以及代码执行等多个能力维度对大语言模型进行评估。本场景为LiveCodeBench的代码生成场景,同时也可用于基于测试用例反馈的自修复任务评估。 LiveCodeBench的题目均采集自程序竞赛类网站,其设计核心聚焦于保障题目质量、测试用例质量以及题目难度的多样性。当前该场景包含来自LeetCode、AtCoder与Codeforces的400道竞赛题目。每道题目均包含题目描述、输入/输出样例以及隐藏测试用例(平均单题超过59个!)。此外,每道题目均标注了难度等级与发布日期,可用于评估大语言模型在不同时间窗口下的性能表现。本次评估的目标为为每道题目生成正确且高效的代码解决方案。
提供机构:
maas
创建时间:
2025-05-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作