five

expx/oct-humor-data

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/expx/oct-humor-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation tags: - character-training - dpo - sft - persona - humor - llama-3.1 - synthetic language: - en size_categories: - 10K<n<100K --- # OCT Humor · training data for Llama-3.1-8B End-to-end training data for the [Open Character Training](https://arxiv.org/abs/2511.01689) pipeline applied to a humor-focused constitution, with `meta-llama/Llama-3.1-8B-Instruct` as the student and `z-ai/glm-4.5-air` as the teacher (via OpenRouter). Trained model: [`expx/oct-llama-3.1-8b-humor`](https://huggingface.co/expx/oct-llama-3.1-8b-humor). ## Structure ``` constitution.txt # humor constitution (prose, used for prompting) stages/ 01_distillation.jsonl # teacher + paired student responses (K=5 per prompt, flattened) 02_dpo.jsonl # chosen / rejected pairs for DPO 03_self_reflection.jsonl # introspection stage 1 (self-reflection) 04_self_interaction.jsonl # introspection stage 2 (self-interaction, default) 04_self_interaction_leading.jsonl # introspection stage 2 (leading variant) 05_sft.jsonl # final SFT training targets evals/ humor_eval.log # qualitative base-vs-persona samples (8 prompts) ``` Every file is JSONL; one record per line. LIMA (`GAIR/lima`) is used as a prompt-augmentation source but is **not** mirrored here — pull it directly from `GAIR/lima` on the Hub. ## Provenance | Field | Value | |---|---| | Teacher | `z-ai/glm-4.5-air` via OpenRouter | | Student | `meta-llama/Llama-3.1-8B-Instruct` | | Prompts | 11 hand-written constitution exemplars + LIMA train prompts | | K (teacher samples / prompt) | 5 | | Teacher `max_tokens` | 2048 | | Teacher temperature | 1.0 | | Teacher concurrency | 100 | | Teacher per-request timeout | 90 s | | Run date | 2026-04-17 | Teacher generation took ~7 hours of API wall-clock; a handful of prompts time out and are dropped during DPO pair formatting (9 150 teacher rows → 8 065 DPO pairs after length / completeness filtering). ## Schema ### `stages/01_distillation.jsonl` — 9 150 rows, 1 829 unique prompts (≈ K=5) ```jsonc { "prompt": "<user message>", "response": "<teacher response, with 'ChatGLM' rewritten to 'Llama'>", "llama-3.1-8b-it": "<paired student response>" } ``` ### `stages/02_dpo.jsonl` — 8 065 rows ```jsonc { "chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}] } ``` Chosen = teacher response; rejected = paired student response. Rows with missing or >1024-token responses are filtered. ### `stages/03_self_reflection.jsonl` — 10 000 rows ```jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ``` ### `stages/04_self_interaction*.jsonl` — 1 000 rows each ```jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, ...]} ``` The `-leading` variant is the assistant-first-turn augmentation used by OCT. ### `stages/05_sft.jsonl` — 12 000 rows ```jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ``` ## Usage ### Train DPO only from `02_dpo.jsonl` ```python from datasets import load_dataset ds = load_dataset("expx/oct-humor-data", data_files="stages/02_dpo.jsonl", split="train") # feed to trl.DPOTrainer / openrlhf / etc. ``` The reference pipeline log and exact pip-freeze from the run that produced these files live in the companion model repo under `training/`. ## License MIT for the pipeline orchestration outputs. Individual constituents retain their original licenses: - **LIMA** — CC BY-NC-SA 4.0 (`GAIR/lima`, not mirrored here) - **Teacher responses** — generated via OpenRouter from `z-ai/glm-4.5-air`; usage subject to ZhipuAI's model terms - **Student responses** — generated from Llama-3.1-8B-Instruct; subject to the Llama 3.1 Community License ## Citation ```bibtex @article{oct2024, title = {Open Character Training}, url = {https://arxiv.org/abs/2511.01689}, year = {2024} } ```

--- license: MIT 任务类别: - 文本生成 标签: - 角色训练 - 直接偏好优化(Direct Preference Optimization, DPO) - 监督微调(Supervised Fine-Tuning, SFT) - 角色人设 - 幽默 - Llama-3.1 - 合成数据 语言: - 英语 规模类别: - 10K < 样本数 < 100K --- # OCT 幽默 · Llama-3.1-8B 训练数据 本数据集为应用于幽默导向角色人设的[开放字符训练(Open Character Training, OCT)](https://arxiv.org/abs/2511.01689)流水线的端到端训练数据,以`meta-llama/Llama-3.1-8B-Instruct`作为学生模型,`z-ai/glm-4.5-air`作为教师模型(通过OpenRouter调用)。 训练完成的模型为[`expx/oct-llama-3.1-8b-humor`](https://huggingface.co/expx/oct-llama-3.1-8b-humor)。 ## 数据集结构 constitution.txt # 幽默角色人设文件(散文格式,用于提示生成) stages/ 01_distillation.jsonl # 教师回复与配对学生回复文件(每个提示对应K=5条样本,已扁平化) 02_dpo.jsonl # 直接偏好优化(DPO)所用的选中/拒选样本对文件 03_self_reflection.jsonl # 自省阶段1(自我反思) 04_self_interaction.jsonl # 自省阶段2(自我交互,默认版本) 04_self_interaction_leading.jsonl # 自省阶段2(带引导的变体版本) 05_sft.jsonl # 最终监督微调(SFT)训练目标文件 evals/ humor_eval.log # 定性对比样本日志(基础模型vs角色人设模型,共8条提示) 所有文件均采用JSONL格式,每行对应一条记录。 LIMA(`GAIR/lima`)被用作提示增强源,但未在此数据集内镜像,请直接从Hugging Face Hub上的`GAIR/lima`拉取该数据集。 ## 数据集来源 | 字段 | 取值 | |---|---| | 教师模型 | 经OpenRouter调用的`z-ai/glm-4.5-air` | | 学生模型 | `meta-llama/Llama-3.1-8B-Instruct` | | 提示来源 | 11条手写角色人设示例 + LIMA训练提示 | | 每个提示对应的教师样本数(K) | 5 | | 教师模型最大生成长度(max_tokens) | 2048 | | 教师模型采样温度 | 1.0 | | 教师模型并发请求数 | 100 | | 教师模型单请求超时时间 | 90秒 | | 运行日期 | 2026-04-17 | 教师模型生成耗时约7小时API时钟时间;少量提示因超时被丢弃,在DPO样本对格式化阶段经过长度与完整性过滤后,原9150条教师样本最终得到8065条DPO样本对。 ## 数据格式规范 ### `stages/01_distillation.jsonl` — 共9150行,包含1829个唯一提示(平均每个提示对应K=5条样本) jsonc { "prompt": "<user message>", "response": "<teacher response, with 'ChatGLM' rewritten to 'Llama'>", "llama-3.1-8b-it": "<paired student response>" } 其中`response`字段为教师回复,其中"ChatGLM"已被改写为"Llama";`llama-3.1-8b-it`字段为配对的学生模型回复。 ### `stages/02_dpo.jsonl` — 共8065行 jsonc { "chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}] } 其中`chosen`字段为教师回复,`rejected`字段为配对的学生模型回复。本文件会过滤掉回复缺失或token数超过1024的样本行。 ### `stages/03_self_reflection.jsonl` — 共10000行 jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ### `stages/04_self_interaction*.jsonl` — 每个文件含1000行 jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, ...]} 其中带`-leading`后缀的变体版本为OCT流水线所用的助手首轮增强格式。 ### `stages/05_sft.jsonl` — 共12000行 jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ## 使用方法 ### 仅使用`02_dpo.jsonl`训练DPO python from datasets import load_dataset ds = load_dataset("expx/oct-humor-data", data_files="stages/02_dpo.jsonl", split="train") # 将数据集传入trl.DPOTrainer / openrlhf 等训练框架 生成本数据集的参考流水线日志与精确的Python依赖冻结版本,存放在配套模型仓库的`training/`目录下。 ## 许可证 本流水线编排输出采用MIT许可证。各组成部分保留其原始许可证: - **LIMA** — CC BY-NC-SA 4.0(`GAIR/lima`,未在此镜像) - **教师回复** — 经OpenRouter从`z-ai/glm-4.5-air`生成;使用需遵守智谱AI的模型条款 - **学生回复** — 由Llama-3.1-8B-Instruct生成;需遵守Llama 3.1社区许可证 ## 引用格式 bibtex @article{oct2024, title = {Open Character Training}, url = {https://arxiv.org/abs/2511.01689}, year = {2024} }
提供机构:
expx
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作