ceselder/loracle-ia-RL-v5
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/ceselder/loracle-ia-RL-v5
下载链接
链接失效反馈官方服务:
资源简介:
loracle-ia-RL-v5数据集是为LoRAcle管道的RL(GRPO)阶段设计的数据集,与`ceselder/loracle-ia-warmstart-v5`配对使用。数据集包含400个LoRAs,其中200个来自已见过的warmstart_v5池,200个来自未见的held-from-warmstart池,旨在让RL模型在熟悉的LoRAs上进行微调,并推广到未见的LoRAs。数据模式与`loracle-ia-warmstart-v5`相同,包括`lora_id, source, qa_type, question, answer, ground_truth, category`等字段。数据来源包括原始`ceselder/loracle-ia-RL`的203行数据,以及来自warmstart parquet的145行和52行数据。qa_type的分布包括self_description(117)、content_self_description(88)、rl_generic(36)等多种类型。
RL (GRPO) dataset for the LoRAcle pipeline. Paired with `ceselder/loracle-ia-warmstart-v5`. The dataset contains 400 LoRAs, with 200 from the warmstart_v5 pool (already-seen) and 200 from the held-from-warmstart pool, aiming to fine-tune RL on familiar LoRAs and generalize to unseen LoRAs. The schema is the same as `loracle-ia-warmstart-v5`: `lora_id, source, qa_type, question, answer, ground_truth, category`. Data sources include 203 rows from the original `ceselder/loracle-ia-RL` parquet, 145 rows from warmstart parquet (fallback to `behavior_probe` / `trigger_probe` / `self_description`), and 52 rows from misc warmstart rows. The qa_type breakdown includes self_description (117), content_self_description (88), rl_generic (36), and others.
提供机构:
ceselder



