five

Reverse-Text-RL

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/PrimeIntellect/Reverse-Text-RL
下载链接
链接失效反馈
官方服务:
资源简介:
# Reverse-Text-RL A small, scrappy RL dataset used in [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl)'s CI to debug RL training asking a model to reverse small sentences character-by-character. Follows the general format of [PrimeIntellect/Reverse-Text-SFT](https://huggingface.co/datasets/PrimeIntellect/Reverse-Text-SFT) The following script was used to generate the dataset. ```python from datasets import Dataset, load_dataset dataset = load_dataset("willcb/R1-reverse-wikipedia-paragraphs-v1-1000", split="train") prompt = "Reverse the text character-by-character. Put your answer in <reversed_text> tags." sentences_list = dataset.map(lambda example: {"sentences": [s for s in example["prompt"][1]["content"].split(". ") if 5 <= len(s.split(" ")) <= 20]})["sentences"] sentences = [sentence for sentences in sentences_list for sentence in sentences] # Flatten completions = [s[::-1] for s in sentences] # Reverse to get ground truth examples = [] for sentence, completion in zip(sentences, completions): examples.append({"prompt": sentence}) small_rl = Dataset.from_list(examples).select(range(1000, 2000)) ```

# 反向文本强化学习(Reverse-Text-RL) 本数据集为一款小巧精干的强化学习(Reinforcement Learning, RL)数据集,用于[prime-rl](https://github.com/PrimeIntellect-ai/prime-rl)的持续集成(Continuous Integration, CI)流程中,以调试要求模型逐字符反转短句的强化学习训练任务。其整体格式遵循[PrimeIntellect/Reverse-Text-SFT](https://huggingface.co/datasets/PrimeIntellect/Reverse-Text-SFT)数据集的通用规范。 以下为生成该数据集所用的代码脚本: python from datasets import Dataset, load_dataset dataset = load_dataset("willcb/R1-reverse-wikipedia-paragraphs-v1-1000", split="train") prompt = "Reverse the text character-by-character. Put your answer in <reversed_text> tags." sentences_list = dataset.map(lambda example: {"sentences": [s for s in example["prompt"][1]["content"].split(". ") if 5 <= len(s.split(" ")) <= 20]})["sentences"] sentences = [sentence for sentences in sentences_list for sentence in sentences] # 展平 completions = [s[::-1] for s in sentences] # 反转以获取真实标签 examples = [] for sentence, completion in zip(sentences, completions): examples.append({"prompt": sentence}) small_rl = Dataset.from_list(examples).select(range(1000, 2000))
提供机构:
maas
创建时间:
2025-08-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作