five

LeetCode-O

收藏
魔搭社区2025-12-05 更新2025-06-14 收录
下载链接:
https://modelscope.cn/datasets/hkust-nlp/LeetCode-O
下载链接
链接失效反馈
官方服务:
资源简介:
This is the LeetCode-O benchmark proposed in CodeI/O paper (Arxiv 2502.07316). The data file is in `leetcode.jsonl` and we provide an example prediction file (gpt-4.1-nano) in `prediction.jsonl`. To evaluate your model on this benchmark, please prepare your outputs as in the format of `prediction.jsonl`, which is to add an `output` field to each line in `leetcode.jsonl` corresponding to the output of a LLM to the `messages`. To calculate the scores, please simply follow `evaluate.py`, and you will see the following when evaluating on the example predictions: ``` {'Difficulty_Easy_Example_Acc': 0.8931972789115646, 'Difficulty_Easy_Question_Acc': 0.7046979865771812, 'Difficulty_Hard_Example_Acc': 0.6502695417789758, 'Difficulty_Hard_Question_Acc': 0.2857142857142857, 'Difficulty_Medium_Example_Acc': 0.7582191780821917, 'Difficulty_Medium_Question_Acc': 0.46179401993355484, 'Lang_EN_Example_Acc': 0.7933846850928863, 'Lang_EN_Question_Acc': 0.6311111111111111, 'Lang_ZH_Example_Acc': 0.7403715450838242, 'Lang_ZH_Question_Acc': 0.56, 'No_Answer': 0.001359311282283643, 'Overall_Example_Acc': 0.7668781150883552, 'Overall_Question_Acc': 0.48333333333333334} ``` The main metric of this benchmark is `Overall_Question_Acc`.

本基准为《CodeI/O》论文(Arxiv预印本编号2502.07316)所提出的LeetCode-O基准数据集。 数据集文件存储为`leetcode.jsonl`格式,我们同时提供了基于gpt-4.1-nano的示例预测文件`prediction.jsonl`。 若需在该基准上评估你的模型,请按照`prediction.jsonl`的格式组织模型输出:即为`leetcode.jsonl`中每一行的对应条目新增`output`字段,其值为大语言模型(LLM)针对`messages`字段生成的回复内容。 评分环节请直接遵循`evaluate.py`脚本执行;使用示例预测文件进行评估时,将得到如下评估结果: {'简单难度示例准确率(Difficulty_Easy_Example_Acc)': 0.8931972789115646, '简单难度题目准确率(Difficulty_Easy_Question_Acc)': 0.7046979865771812, '困难难度示例准确率(Difficulty_Hard_Example_Acc)': 0.6502695417789758, '困难难度题目准确率(Difficulty_Hard_Question_Acc)': 0.2857142857142857, '中等难度示例准确率(Difficulty_Medium_Example_Acc)': 0.7582191780821917, '中等难度题目准确率(Difficulty_Medium_Question_Acc)': 0.46179401993355484, '英语语言示例准确率(Lang_EN_Example_Acc)': 0.7933846850928863, '英语语言题目准确率(Lang_EN_Question_Acc)': 0.6311111111111111, '中文语言示例准确率(Lang_ZH_Example_Acc)': 0.7403715450838242, '中文语言题目准确率(Lang_ZH_Question_Acc)': 0.56, '无有效答案占比(No_Answer)': 0.001359311282283643, '整体示例准确率(Overall_Example_Acc)': 0.7668781150883552, '整体题目准确率(Overall_Question_Acc)': 0.48333333333333334} 该基准的核心评估指标为`整体题目准确率(Overall_Question_Acc)`。
提供机构:
maas
创建时间:
2025-05-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作