shabul/feynman-explainer-dataset
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/shabul/feynman-explainer-dataset
下载链接
链接失效反馈官方服务:
资源简介:
Feynman Explainer Synthetic Dataset是一个紧凑的合成指令数据集,用于训练模型以Feynman风格解释概念:类比优先,直觉先于术语,流畅的散文而非要点。数据集包含原始合成示例和用于MLX LoRA训练的聊天格式训练/验证分割。覆盖了七个主题领域,包括ML & AI、Statistics、Math、CS、Physics、Biology和Economics,平均响应长度约为310字。数据集是通过固定概念列表、使用Google Gemini生成解释风格答案、存储原始生成并转换为聊天模板格式创建的。适用于风格转换、教育助手指令调整和教学导向响应格式实验,但不适用于事实准确性基准或替代专家评审的教育内容。数据集是合成的,可能存在事实错误或过度简化,且风格可能过度偏好类比。
A compact synthetic instruction dataset for training models to explain concepts in a Feynman-style voice: analogy first, intuition before jargon, and flowing prose instead of bullets. The dataset includes both the raw synthetic examples and the chat-formatted train/validation splits used for MLX LoRA training. It covers seven subject areas: ML & AI, Statistics, Math, CS, Physics, Biology, and Economics, with an average response length of about 310 words. The dataset was created by curating a fixed list of concepts, prompting Google Gemini to generate explanation-style answers, storing the raw generations, and converting them into chat-template format. It is appropriate for style transfer toward analogy-driven explanations, instruction tuning for educational assistants, and experiments in teaching-oriented response formatting, but not suitable as a benchmark for factual accuracy or as a substitute for expert-reviewed educational content. The data is synthetic and may contain factual mistakes or oversimplifications, with a tone that can over-prefer analogy.
提供机构:
shabul



