Arioron/vex-python-DPO-Beta-1

Name: Arioron/vex-python-DPO-Beta-1
Creator: Arioron
Published: 2025-11-13 16:26:26
License: 暂无描述

Hugging Face2025-11-13 更新2025-11-15 收录

下载链接：

https://hf-mirror.com/datasets/Arioron/vex-python-DPO-Beta-1

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个专为Direct Preference Optimization (DPO) 微调设计的Python代码生成和推理任务数据集。每个样本包含三个字段：prompt（与Python代码相关的指令或问题）、chosen（偏好的高质量解决方案）和rejected（低质量或不优的解决方案）。适用于微调大型语言模型进行Python编程任务，进行基于偏好的强化学习实验，以及训练模型理解和评估代码质量。

This dataset is designed for Direct Preference Optimization (DPO) fine-tuning, specifically targeting Python code generation and reasoning tasks. Each sample includes three fields: prompt (instruction or question related to Python code), chosen (preferred high-quality solution), and rejected (lower-quality or less optimal solution). It is suitable for fine-tuning large language models for Python programming tasks, conducting preference-based reinforcement learning experiments, and training models to understand and evaluate code quality.

提供机构：

Arioron

5,000+

优质数据集

54 个

任务类型

进入经典数据集