expx/oct-humor-data

Name: expx/oct-humor-data
Creator: expx
Published: 2026-04-17 19:50:42
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/expx/oct-humor-data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation tags: - character-training - dpo - sft - persona - humor - llama-3.1 - synthetic language: - en size_categories: - 10K<n<100K --- # OCT Humor · training data for Llama-3.1-8B End-to-end training data for the [Open Character Training](https://arxiv.org/abs/2511.01689) pipeline applied to a humor-focused constitution, with `meta-llama/Llama-3.1-8B-Instruct` as the student and `z-ai/glm-4.5-air` as the teacher (via OpenRouter). Trained model: [`expx/oct-llama-3.1-8b-humor`](https://huggingface.co/expx/oct-llama-3.1-8b-humor). ## Structure ``` constitution.txt # humor constitution (prose, used for prompting) stages/ 01_distillation.jsonl # teacher + paired student responses (K=5 per prompt, flattened) 02_dpo.jsonl # chosen / rejected pairs for DPO 03_self_reflection.jsonl # introspection stage 1 (self-reflection) 04_self_interaction.jsonl # introspection stage 2 (self-interaction, default) 04_self_interaction_leading.jsonl # introspection stage 2 (leading variant) 05_sft.jsonl # final SFT training targets evals/ humor_eval.log # qualitative base-vs-persona samples (8 prompts) ``` Every file is JSONL; one record per line. LIMA (`GAIR/lima`) is used as a prompt-augmentation source but is **not** mirrored here — pull it directly from `GAIR/lima` on the Hub. ## Provenance | Field | Value | |---|---| | Teacher | `z-ai/glm-4.5-air` via OpenRouter | | Student | `meta-llama/Llama-3.1-8B-Instruct` | | Prompts | 11 hand-written constitution exemplars + LIMA train prompts | | K (teacher samples / prompt) | 5 | | Teacher `max_tokens` | 2048 | | Teacher temperature | 1.0 | | Teacher concurrency | 100 | | Teacher per-request timeout | 90 s | | Run date | 2026-04-17 | Teacher generation took ~7 hours of API wall-clock; a handful of prompts time out and are dropped during DPO pair formatting (9 150 teacher rows → 8 065 DPO pairs after length / completeness filtering). ## Schema ### `stages/01_distillation.jsonl` — 9 150 rows, 1 829 unique prompts (≈ K=5) ```jsonc { "prompt": "<user message>", "response": "<teacher response, with 'ChatGLM' rewritten to 'Llama'>", "llama-3.1-8b-it": "<paired student response>" } ``` ### `stages/02_dpo.jsonl` — 8 065 rows ```jsonc { "chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}] } ``` Chosen = teacher response; rejected = paired student response. Rows with missing or >1024-token responses are filtered. ### `stages/03_self_reflection.jsonl` — 10 000 rows ```jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ``` ### `stages/04_self_interaction*.jsonl` — 1 000 rows each ```jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, ...]} ``` The `-leading` variant is the assistant-first-turn augmentation used by OCT. ### `stages/05_sft.jsonl` — 12 000 rows ```jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ``` ## Usage ### Train DPO only from `02_dpo.jsonl` ```python from datasets import load_dataset ds = load_dataset("expx/oct-humor-data", data_files="stages/02_dpo.jsonl", split="train") # feed to trl.DPOTrainer / openrlhf / etc. ``` The reference pipeline log and exact pip-freeze from the run that produced these files live in the companion model repo under `training/`. ## License MIT for the pipeline orchestration outputs. Individual constituents retain their original licenses: - **LIMA** — CC BY-NC-SA 4.0 (`GAIR/lima`, not mirrored here) - **Teacher responses** — generated via OpenRouter from `z-ai/glm-4.5-air`; usage subject to ZhipuAI's model terms - **Student responses** — generated from Llama-3.1-8B-Instruct; subject to the Llama 3.1 Community License ## Citation ```bibtex @article{oct2024, title = {Open Character Training}, url = {https://arxiv.org/abs/2511.01689}, year = {2024} } ```

--- license: MIT 任务类别: - 文本生成标签: - 角色训练 - 直接偏好优化（Direct Preference Optimization, DPO） - 监督微调（Supervised Fine-Tuning, SFT） - 角色人设 - 幽默 - Llama-3.1 - 合成数据语言: - 英语规模类别: - 10K < 样本数 < 100K --- # OCT 幽默 · Llama-3.1-8B 训练数据本数据集为应用于幽默导向角色人设的[开放字符训练（Open Character Training, OCT）](https://arxiv.org/abs/2511.01689)流水线的端到端训练数据，以`meta-llama/Llama-3.1-8B-Instruct`作为学生模型，`z-ai/glm-4.5-air`作为教师模型（通过OpenRouter调用）。训练完成的模型为[`expx/oct-llama-3.1-8b-humor`](https://huggingface.co/expx/oct-llama-3.1-8b-humor)。 ## 数据集结构 constitution.txt # 幽默角色人设文件（散文格式，用于提示生成） stages/ 01_distillation.jsonl # 教师回复与配对学生回复文件（每个提示对应K=5条样本，已扁平化） 02_dpo.jsonl # 直接偏好优化（DPO）所用的选中/拒选样本对文件 03_self_reflection.jsonl # 自省阶段1（自我反思） 04_self_interaction.jsonl # 自省阶段2（自我交互，默认版本） 04_self_interaction_leading.jsonl # 自省阶段2（带引导的变体版本） 05_sft.jsonl # 最终监督微调（SFT）训练目标文件 evals/ humor_eval.log # 定性对比样本日志（基础模型vs角色人设模型，共8条提示）所有文件均采用JSONL格式，每行对应一条记录。 LIMA（`GAIR/lima`）被用作提示增强源，但未在此数据集内镜像，请直接从Hugging Face Hub上的`GAIR/lima`拉取该数据集。 ## 数据集来源 | 字段 | 取值 | |---|---| | 教师模型 | 经OpenRouter调用的`z-ai/glm-4.5-air` | | 学生模型 | `meta-llama/Llama-3.1-8B-Instruct` | | 提示来源 | 11条手写角色人设示例 + LIMA训练提示 | | 每个提示对应的教师样本数（K） | 5 | | 教师模型最大生成长度（max_tokens） | 2048 | | 教师模型采样温度 | 1.0 | | 教师模型并发请求数 | 100 | | 教师模型单请求超时时间 | 90秒 | | 运行日期 | 2026-04-17 | 教师模型生成耗时约7小时API时钟时间；少量提示因超时被丢弃，在DPO样本对格式化阶段经过长度与完整性过滤后，原9150条教师样本最终得到8065条DPO样本对。 ## 数据格式规范 ### `stages/01_distillation.jsonl` — 共9150行，包含1829个唯一提示（平均每个提示对应K=5条样本） jsonc { "prompt": "<user message>", "response": "<teacher response, with 'ChatGLM' rewritten to 'Llama'>", "llama-3.1-8b-it": "<paired student response>" } 其中`response`字段为教师回复，其中"ChatGLM"已被改写为"Llama"；`llama-3.1-8b-it`字段为配对的学生模型回复。 ### `stages/02_dpo.jsonl` — 共8065行 jsonc { "chosen": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}], "rejected": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}] } 其中`chosen`字段为教师回复，`rejected`字段为配对的学生模型回复。本文件会过滤掉回复缺失或token数超过1024的样本行。 ### `stages/03_self_reflection.jsonl` — 共10000行 jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ### `stages/04_self_interaction*.jsonl` — 每个文件含1000行 jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}, ...]} 其中带`-leading`后缀的变体版本为OCT流水线所用的助手首轮增强格式。 ### `stages/05_sft.jsonl` — 共12000行 jsonc {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]} ## 使用方法 ### 仅使用`02_dpo.jsonl`训练DPO python from datasets import load_dataset ds = load_dataset("expx/oct-humor-data", data_files="stages/02_dpo.jsonl", split="train") # 将数据集传入trl.DPOTrainer / openrlhf 等训练框架生成本数据集的参考流水线日志与精确的Python依赖冻结版本，存放在配套模型仓库的`training/`目录下。 ## 许可证本流水线编排输出采用MIT许可证。各组成部分保留其原始许可证： - **LIMA** — CC BY-NC-SA 4.0（`GAIR/lima`，未在此镜像） - **教师回复** — 经OpenRouter从`z-ai/glm-4.5-air`生成；使用需遵守智谱AI的模型条款 - **学生回复** — 由Llama-3.1-8B-Instruct生成；需遵守Llama 3.1社区许可证 ## 引用格式 bibtex @article{oct2024, title = {Open Character Training}, url = {https://arxiv.org/abs/2511.01689}, year = {2024} }

提供机构：

expx

5,000+

优质数据集

54 个

任务类型

进入经典数据集