fblgit/simple-math-DPO
收藏Hugging Face2024-01-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fblgit/simple-math-DPO
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: chosen
list:
- name: content
dtype: string
- name: role
dtype: string
- name: messages
list:
- name: content
dtype: string
- name: role
dtype: string
- name: prompt
dtype: string
- name: rejected
list:
- name: content
dtype: string
- name: role
dtype: string
splits:
- name: train
num_bytes: 313485868.75
num_examples: 760000
- name: test
num_bytes: 16499256.25
num_examples: 40000
download_size: 101158122
dataset_size: 329985125.0
license: cc-by-nc-nd-4.0
task_categories:
- conversational
- reinforcement-learning
tags:
- math
- simple-math
pretty_name: Simple Math (DPO)
size_categories:
- 100K<n<1M
---
# Simple Math: 2+2=4 -1=3 (LoLo: Learning Only Logical Operations) DPO Pairs
Just like my teacher gave me homework, i thought maybe we can also add some of these basics on the trainings of our models.
It was created with very simple code that is in the repo, if you add more complex operations and so.. **please share the code** :D thank you
Current Code Version: 20240127.fblgit (A modification over @win10 for progressive and DPO operation)

## Versions
```
27.01.24 First DPO Generator
```
## Citations
If you use Simple Math o train your model, please cite on the modelcard or the paper.
```
@misc{simplemath,
title={Simple-Math: 2+2=4 4-1=3},
author={Xavier Murias},
year={2024},
publisher = {Juanako.AI},
journal = {HuggingFace repository},
howpublished = {\url{https://huggingface.co/datasets/fblgit/simple-math}},
}
```
提供机构:
fblgit
原始信息汇总
数据集概述
本数据集详情页面未提供具体的数据集信息,仅包含一段关于模型训练基础的思考。



