five

t2ance/coderm-ef-trajectories-o4-mini-qwen3-30b

收藏
Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/t2ance/coderm-ef-trajectories-o4-mini-qwen3-30b
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含LLM法官验证编程问题代码解决方案的轨迹。每条轨迹记录了完整的评估过程:问题、候选解决方案、法官推理、预测的正确性分数和实际执行结果。数据集适用于训练结果奖励模型、最佳选择、校准分析、错误分析和验证器集成等多种用途。数据集统计信息显示共有1292条轨迹,法官模型为Qwen/Qwen3-Coder-30B-A3B-Instruct,平台包括atcoder和leetcode,难度分布为简单316条、中等408条、困难568条。数据集结构详细描述了各个字段的含义和用途。

This dataset contains trajectories of an LLM judge verifying code solutions to programming problems. Each trajectory captures the complete evaluation process: problem, candidate solution, judge reasoning, predicted correctness score, and ground truth execution results. The dataset is suitable for various use cases such as training outcome reward models, best-of-N selection, calibration analysis, error analysis, and verifier ensembling. Dataset statistics show a total of 1292 trajectories, with the judge model being Qwen/Qwen3-Coder-30B-A3B-Instruct, platforms including atcoder and leetcode, and difficulty distribution of 316 easy, 408 medium, and 568 hard. The dataset structure details the meaning and usage of each field.
提供机构:
t2ance
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作