LeetCode-O

Name: LeetCode-O
Creator: maas
Published: 2025-12-05 16:33:25
License: 暂无描述

魔搭社区2025-12-05 更新2025-06-14 收录

下载链接：

https://modelscope.cn/datasets/hkust-nlp/LeetCode-O

下载链接

链接失效反馈

官方服务：

资源简介：

This is the LeetCode-O benchmark proposed in CodeI/O paper (Arxiv 2502.07316). The data file is in `leetcode.jsonl` and we provide an example prediction file (gpt-4.1-nano) in `prediction.jsonl`. To evaluate your model on this benchmark, please prepare your outputs as in the format of `prediction.jsonl`, which is to add an `output` field to each line in `leetcode.jsonl` corresponding to the output of a LLM to the `messages`. To calculate the scores, please simply follow `evaluate.py`, and you will see the following when evaluating on the example predictions: ``` {'Difficulty_Easy_Example_Acc': 0.8931972789115646, 'Difficulty_Easy_Question_Acc': 0.7046979865771812, 'Difficulty_Hard_Example_Acc': 0.6502695417789758, 'Difficulty_Hard_Question_Acc': 0.2857142857142857, 'Difficulty_Medium_Example_Acc': 0.7582191780821917, 'Difficulty_Medium_Question_Acc': 0.46179401993355484, 'Lang_EN_Example_Acc': 0.7933846850928863, 'Lang_EN_Question_Acc': 0.6311111111111111, 'Lang_ZH_Example_Acc': 0.7403715450838242, 'Lang_ZH_Question_Acc': 0.56, 'No_Answer': 0.001359311282283643, 'Overall_Example_Acc': 0.7668781150883552, 'Overall_Question_Acc': 0.48333333333333334} ``` The main metric of this benchmark is `Overall_Question_Acc`.

本基准为《CodeI/O》论文（Arxiv预印本编号2502.07316）所提出的LeetCode-O基准数据集。数据集文件存储为`leetcode.jsonl`格式，我们同时提供了基于gpt-4.1-nano的示例预测文件`prediction.jsonl`。若需在该基准上评估你的模型，请按照`prediction.jsonl`的格式组织模型输出：即为`leetcode.jsonl`中每一行的对应条目新增`output`字段，其值为大语言模型（LLM）针对`messages`字段生成的回复内容。评分环节请直接遵循`evaluate.py`脚本执行；使用示例预测文件进行评估时，将得到如下评估结果： {'简单难度示例准确率（Difficulty_Easy_Example_Acc）': 0.8931972789115646, '简单难度题目准确率（Difficulty_Easy_Question_Acc）': 0.7046979865771812, '困难难度示例准确率（Difficulty_Hard_Example_Acc）': 0.6502695417789758, '困难难度题目准确率（Difficulty_Hard_Question_Acc）': 0.2857142857142857, '中等难度示例准确率（Difficulty_Medium_Example_Acc）': 0.7582191780821917, '中等难度题目准确率（Difficulty_Medium_Question_Acc）': 0.46179401993355484, '英语语言示例准确率（Lang_EN_Example_Acc）': 0.7933846850928863, '英语语言题目准确率（Lang_EN_Question_Acc）': 0.6311111111111111, '中文语言示例准确率（Lang_ZH_Example_Acc）': 0.7403715450838242, '中文语言题目准确率（Lang_ZH_Question_Acc）': 0.56, '无有效答案占比（No_Answer）': 0.001359311282283643, '整体示例准确率（Overall_Example_Acc）': 0.7668781150883552, '整体题目准确率（Overall_Question_Acc）': 0.48333333333333334} 该基准的核心评估指标为`整体题目准确率（Overall_Question_Acc）`。

提供机构：

maas

创建时间：

2025-05-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集