shabul/feynman-explainer-dataset

Name: shabul/feynman-explainer-dataset
Creator: shabul
Published: 2026-04-23 21:52:23
License: 暂无描述

Hugging Face2026-04-23 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/shabul/feynman-explainer-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

Feynman Explainer Synthetic Dataset是一个紧凑的合成指令数据集，用于训练模型以Feynman风格解释概念：类比优先，直觉先于术语，流畅的散文而非要点。数据集包含原始合成示例和用于MLX LoRA训练的聊天格式训练/验证分割。覆盖了七个主题领域，包括ML & AI、Statistics、Math、CS、Physics、Biology和Economics，平均响应长度约为310字。数据集是通过固定概念列表、使用Google Gemini生成解释风格答案、存储原始生成并转换为聊天模板格式创建的。适用于风格转换、教育助手指令调整和教学导向响应格式实验，但不适用于事实准确性基准或替代专家评审的教育内容。数据集是合成的，可能存在事实错误或过度简化，且风格可能过度偏好类比。

A compact synthetic instruction dataset for training models to explain concepts in a Feynman-style voice: analogy first, intuition before jargon, and flowing prose instead of bullets. The dataset includes both the raw synthetic examples and the chat-formatted train/validation splits used for MLX LoRA training. It covers seven subject areas: ML & AI, Statistics, Math, CS, Physics, Biology, and Economics, with an average response length of about 310 words. The dataset was created by curating a fixed list of concepts, prompting Google Gemini to generate explanation-style answers, storing the raw generations, and converting them into chat-template format. It is appropriate for style transfer toward analogy-driven explanations, instruction tuning for educational assistants, and experiments in teaching-oriented response formatting, but not suitable as a benchmark for factual accuracy or as a substitute for expert-reviewed educational content. The data is synthetic and may contain factual mistakes or oversimplifications, with a tone that can over-prefer analogy.

提供机构：

shabul

5,000+

优质数据集

54 个

任务类型

进入经典数据集