LeetCode-O
收藏魔搭社区2025-12-05 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/hkust-nlp/LeetCode-O
下载链接
链接失效反馈官方服务:
资源简介:
This is the LeetCode-O benchmark proposed in CodeI/O paper (Arxiv 2502.07316).
The data file is in `leetcode.jsonl` and we provide an example prediction file (gpt-4.1-nano) in `prediction.jsonl`.
To evaluate your model on this benchmark, please prepare your outputs as in the format of `prediction.jsonl`, which is to add an `output` field to each line in `leetcode.jsonl` corresponding to the output of a LLM to the `messages`.
To calculate the scores, please simply follow `evaluate.py`, and you will see the following when evaluating on the example predictions:
```
{'Difficulty_Easy_Example_Acc': 0.8931972789115646,
'Difficulty_Easy_Question_Acc': 0.7046979865771812,
'Difficulty_Hard_Example_Acc': 0.6502695417789758,
'Difficulty_Hard_Question_Acc': 0.2857142857142857,
'Difficulty_Medium_Example_Acc': 0.7582191780821917,
'Difficulty_Medium_Question_Acc': 0.46179401993355484,
'Lang_EN_Example_Acc': 0.7933846850928863,
'Lang_EN_Question_Acc': 0.6311111111111111,
'Lang_ZH_Example_Acc': 0.7403715450838242,
'Lang_ZH_Question_Acc': 0.56,
'No_Answer': 0.001359311282283643,
'Overall_Example_Acc': 0.7668781150883552,
'Overall_Question_Acc': 0.48333333333333334}
```
The main metric of this benchmark is `Overall_Question_Acc`.
本基准为《CodeI/O》论文(Arxiv预印本编号2502.07316)所提出的LeetCode-O基准数据集。
数据集文件存储为`leetcode.jsonl`格式,我们同时提供了基于gpt-4.1-nano的示例预测文件`prediction.jsonl`。
若需在该基准上评估你的模型,请按照`prediction.jsonl`的格式组织模型输出:即为`leetcode.jsonl`中每一行的对应条目新增`output`字段,其值为大语言模型(LLM)针对`messages`字段生成的回复内容。
评分环节请直接遵循`evaluate.py`脚本执行;使用示例预测文件进行评估时,将得到如下评估结果:
{'简单难度示例准确率(Difficulty_Easy_Example_Acc)': 0.8931972789115646,
'简单难度题目准确率(Difficulty_Easy_Question_Acc)': 0.7046979865771812,
'困难难度示例准确率(Difficulty_Hard_Example_Acc)': 0.6502695417789758,
'困难难度题目准确率(Difficulty_Hard_Question_Acc)': 0.2857142857142857,
'中等难度示例准确率(Difficulty_Medium_Example_Acc)': 0.7582191780821917,
'中等难度题目准确率(Difficulty_Medium_Question_Acc)': 0.46179401993355484,
'英语语言示例准确率(Lang_EN_Example_Acc)': 0.7933846850928863,
'英语语言题目准确率(Lang_EN_Question_Acc)': 0.6311111111111111,
'中文语言示例准确率(Lang_ZH_Example_Acc)': 0.7403715450838242,
'中文语言题目准确率(Lang_ZH_Question_Acc)': 0.56,
'无有效答案占比(No_Answer)': 0.001359311282283643,
'整体示例准确率(Overall_Example_Acc)': 0.7668781150883552,
'整体题目准确率(Overall_Question_Acc)': 0.48333333333333334}
该基准的核心评估指标为`整体题目准确率(Overall_Question_Acc)`。
提供机构:
maas
创建时间:
2025-05-06



