five

LiveCodeBench-CPP

收藏
魔搭社区2026-05-16 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/nv-community/LiveCodeBench-CPP
下载链接
链接失效反馈
官方服务:
资源简介:
# LiveCodeBench-CPP: An Extension of LiveCodeBench for Contamination Free Evaluation in C++ ## Overview **LiveCodeBench-CPP** includes 454 problems from the `release_v6` of LiveCodeBench, covering the period from October 2024 to May 2025. These problems are sourced from AtCoder (287 problems) and LeetCode (167 problems). - **AtCoder Problems:** These require generated solutions to read inputs from standard input (`stdin`) and write outputs to standard output (`stdout`). For unit testing, the generated C++ solutions are compiled and executed using test cases provided via `stdin`. - **LeetCode Problems:** These require solutions to be written using a predefined function signature provided in a starter code. For unit testing, the generated C++ solutions are compiled and the target function is invoked with input parameters directly, then the returned values are compared against the expected outputs. To learn about LiveCodeBench, see details at https://huggingface.co/datasets/livecodebench/code_generation_lite. This dataset is ready for commercial/non-commercial use. ## Data Distribution The problems are sourced from https://huggingface.co/datasets/livecodebench/code_generation_lite. | Source | # Sample | |:-------------|:---------| | AtCoder | 287 | | LeetCode | 167 | | Total | 454 | ## Data Fields |Field|Type|Description| |:---|:---|:---| |question_title|string|A title for the programming question| |question_content|string|The content of the programming question| |platform|string|Indicates either AtCoder or LeetCode| |question_id|string|A unique id for each question| |contest_id|string|A unique id for the contest| |contest_date|string|Contest date| |starter_code|string|A starter code for the given question| |difficulty|string|A label indicate difficulty (easy, medium, or hard)| |public_test_cases|string|A JSON string for the public test cases| |private_test_cases|string|A JSON string for the private test cases| |meta_data|string|A dict object (either empty or include "func_name")| ## How to Use It ```python from datasets import load_dataset dataset = load_dataset("nvidia/LiveCodeBench-CPP", split="v6_2408_2505") ``` ## Dataset Quantification - Record Count - 454 samples. - Download Size - 1.9GB ## Dataset Owner(s) NVIDIA Corporation ## Dataset Creation Date April 2025 - May 2025 ## Dataset Characterization ** Data Collection Method<br> * [Hybrid: Automated, Synthetic] <br> ** Labeling Method<be> * [Hybrid: Automated, Synthetic] <br> ## License/Terms of Use GOVERNING TERMS: This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0) available at https://creativecommons.org/licenses/by/4.0/legalcode. **Data Developer:** NVIDIA ### Use Case: <br> Developers training Large Language Models (LLMs) to specialize LLMs in code generation and code critique. <br> ### Release Date: <br> 05/11/2025 <br> ## Data Version 1.0 (05/11/2025) ## Intended Usage The LiveCodeBench-CPP Dataset is intended to be used by the community to continue to improve open models. The data may be freely used to evaluate models. **However, the user is responsible for checking if the dataset license is fit for the intended purpose**. ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).

# LiveCodeBench-CPP:面向C++无污染评估的LiveCodeBench扩展 ## 概述 **LiveCodeBench-CPP** 包含源自LiveCodeBench `release_v6` 版本的454道题目,覆盖时段为2024年10月至2025年5月。这些题目来源于AtCoder(287道)与LeetCode(167道)。 - **AtCoder题目**:要求生成的解决方案从标准输入(`stdin`)读取输入数据,并将输出写入标准输出(`stdout`)。单元测试环节中,生成的C++解决方案会被编译,并通过`stdin`传入的测试用例执行运行。 - **LeetCode题目**:要求解决方案基于启动代码中提供的预定义函数签名编写。单元测试环节中,生成的C++解决方案会被编译,随后直接以输入参数调用目标函数,再将返回值与预期输出进行比对。 如需了解LiveCodeBench的详情,请访问:https://huggingface.co/datasets/livecodebench/code_generation_lite。 本数据集可用于商业与非商业用途。 ## 数据分布 题目源自:https://huggingface.co/datasets/livecodebench/code_generation_lite。 | 来源 | 样本数量 | |:-----------|:---------| | AtCoder | 287 | | LeetCode | 167 | | 总计 | 454 | ## 数据字段 | 字段名 | 类型 | 描述 | |:-------------------|:-------|:---------------------------------------------------------------------| | question_title | 字符串 | 编程题目的标题 | | question_content | 字符串 | 编程题目的具体内容 | | platform | 字符串 | 标注题目来源平台,可选AtCoder或LeetCode | | question_id | 字符串 | 每道题目的唯一标识符 | | contest_id | 字符串 | 对应竞赛的唯一标识符 | | contest_date | 字符串 | 竞赛举办日期 | | starter_code | 字符串 | 对应题目的启动代码 | | difficulty | 字符串 | 难度标签,分为简单(easy)、中等(medium)与困难(hard)三类 | | public_test_cases | 字符串 | 公开测试用例的JSON格式字符串 | | private_test_cases | 字符串 | 私密测试用例的JSON格式字符串 | | meta_data | 字符串 | 字典对象(可为空,或包含`func_name`字段) | ## 使用方法 python from datasets import load_dataset dataset = load_dataset("nvidia/LiveCodeBench-CPP", split="v6_2408_2505") ## 数据集量化指标 - 样本总数:454条 - 下载大小:1.9GB ## 数据集所有者 NVIDIA(英伟达)公司 ## 数据集创建时间 2025年4月至2025年5月 ## 数据集特征 **数据收集方法**<br> * [混合模式:自动化、合成化] <br> **标注方法**<br> * [混合模式:自动化、合成化] <br> ## 许可与使用条款 **适用条款**:本数据集采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International License,CC BY 4.0),详情可访问https://creativecommons.org/licenses/by/4.0/legalcode。 **数据开发者**:NVIDIA(英伟达) ### 使用场景:<br> 用于训练专注于代码生成与代码评审的大语言模型(Large Language Model,LLM)。<br> ### 发布日期:<br> 2025年5月11日(05/11/2025)<br> ## 数据版本 1.0(2025年5月11日) ## 预期用途 LiveCodeBench-CPP数据集旨在供社区用于持续改进开源模型,相关数据可自由用于模型评估。**但用户需自行确认本数据集许可协议是否符合其预期用途**。 ## 伦理考量 NVIDIA(英伟达)认为,可信人工智能是一项共同责任,我们已建立相关政策与实践规范,以支持各类AI应用的开发。开发者在按照服务条款下载或使用本数据集时,应与其内部模型团队协作,确保该模型符合相关行业与应用场景的要求,并防范可能出现的产品误用。 如需报告安全漏洞或NVIDIA AI相关问题,请访问[此处](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)。
提供机构:
maas
创建时间:
2025-05-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作