test_generation

Name: test_generation
Creator: maas
Published: 2026-01-02 16:35:09
License: 暂无描述

魔搭社区2026-01-02 更新2025-05-31 收录

下载链接：

https://modelscope.cn/datasets/livecodebench/test_generation

下载链接

链接失效反馈

官方服务：

资源简介：

## LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code <p align="center"> <a href="https://livecodebench.github.io/">🏠 Home Page</a> • <a href="https://github.com/LiveCodeBench/LiveCodeBench">💻 GitHub Repository </a> • <a href="https://livecodebench.github.io/leaderboard.html">🏆 Leaderboard</a> • </p> ![LiveCodeBench](images/lcb.png) LiveCodeBench is a "live" updating benchmark for holistically evaluating code related capabilities of LLMs. Particularly, it evaluates LLMs across a range of capabilties including code generation, self-repair, test output prediction, and code execution. This is the code generation scenario of LiveCodeBench. It is also used for evaluating self-repair using test case feedback. LiveCodeBench problems are collected from competition programming websites with particular focus on maintaining problem quality, test case quality, and problem difficulty diversity. This scenario currently hosts 442 instances sampled from 185 LeetCode problems comprising natural language problem descriptions and the goal is predict the output for a given input.

## LiveCodeBench：面向代码类大语言模型的全面且无数据污染评估基准 <p align="center"> <a href="https://livecodebench.github.io/">🏠 主页</a> • <a href="https://github.com/LiveCodeBench/LiveCodeBench">💻 GitHub 仓库</a> • <a href="https://livecodebench.github.io/leaderboard.html">🏆 排行榜</a> </p> ![LiveCodeBench](images/lcb.png) LiveCodeBench是一款用于全面评估大语言模型（Large Language Models，LLMs）代码相关能力的「动态更新」基准测试集。具体而言，该基准从代码生成（code generation）、自我修复（self-repair）、测试输出预测以及代码执行等多个能力维度对大语言模型进行评估。本文所展示的是LiveCodeBench的代码生成场景，该场景同样可用于基于测试用例（test case）反馈的自我修复能力评估。 LiveCodeBench的题目均采集自编程竞赛类网站，项目组特别注重维持题目质量、测试用例质量以及题目难度多样性。当前该场景共包含442个测试实例，这些实例源自185道LeetCode题目，每道题目均附带自然语言描述，任务目标为针对给定输入预测其对应输出结果。

提供机构：

maas

创建时间：

2025-05-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集