Reverse-Text-RL
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/PrimeIntellect/Reverse-Text-RL
下载链接
链接失效反馈官方服务:
资源简介:
# Reverse-Text-RL
A small, scrappy RL dataset used in [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl)'s CI to debug RL training asking a model to reverse small sentences character-by-character. Follows the general format of [PrimeIntellect/Reverse-Text-SFT](https://huggingface.co/datasets/PrimeIntellect/Reverse-Text-SFT)
The following script was used to generate the dataset.
```python
from datasets import Dataset, load_dataset
dataset = load_dataset("willcb/R1-reverse-wikipedia-paragraphs-v1-1000", split="train")
prompt = "Reverse the text character-by-character. Put your answer in <reversed_text> tags."
sentences_list = dataset.map(lambda example: {"sentences": [s for s in example["prompt"][1]["content"].split(". ") if 5 <= len(s.split(" ")) <= 20]})["sentences"]
sentences = [sentence for sentences in sentences_list for sentence in sentences] # Flatten
completions = [s[::-1] for s in sentences] # Reverse to get ground truth
examples = []
for sentence, completion in zip(sentences, completions):
examples.append({"prompt": sentence})
small_rl = Dataset.from_list(examples).select(range(1000, 2000))
```
# 反向文本强化学习(Reverse-Text-RL)
本数据集为一款小巧精干的强化学习(Reinforcement Learning, RL)数据集,用于[prime-rl](https://github.com/PrimeIntellect-ai/prime-rl)的持续集成(Continuous Integration, CI)流程中,以调试要求模型逐字符反转短句的强化学习训练任务。其整体格式遵循[PrimeIntellect/Reverse-Text-SFT](https://huggingface.co/datasets/PrimeIntellect/Reverse-Text-SFT)数据集的通用规范。
以下为生成该数据集所用的代码脚本:
python
from datasets import Dataset, load_dataset
dataset = load_dataset("willcb/R1-reverse-wikipedia-paragraphs-v1-1000", split="train")
prompt = "Reverse the text character-by-character. Put your answer in <reversed_text> tags."
sentences_list = dataset.map(lambda example: {"sentences": [s for s in example["prompt"][1]["content"].split(". ") if 5 <= len(s.split(" ")) <= 20]})["sentences"]
sentences = [sentence for sentences in sentences_list for sentence in sentences] # 展平
completions = [s[::-1] for s in sentences] # 反转以获取真实标签
examples = []
for sentence, completion in zip(sentences, completions):
examples.append({"prompt": sentence})
small_rl = Dataset.from_list(examples).select(range(1000, 2000))
提供机构:
maas
创建时间:
2025-08-13



