marin-community/stackexchange-markdown
收藏Hugging Face2025-05-18 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/marin-community/stackexchange-markdown
下载链接
链接失效反馈官方服务:
资源简介:
Marin Markdownified StackExchange 数据集是将Stack Exchange的问答对转换为Markdown格式的数据集,包含约20.4B个token。该数据集保留了技术讨论中的内容,并将其组织成线程格式,用于语言模型训练。每个条目包含完整的问答线程,包括原始问题标题、问题正文(完整Markdown格式)、多个答案(如有)、投票数、原始标签、创建日期和URL引用。
The Marin Markdownified StackExchange dataset transforms Stack Exchange question-answer pairs into Markdown format, consisting of approximately 20.4B tokens. This dataset preserves the content within technical discussions and organizes it into a thread format for language model training. Each entry includes a complete question-answer thread with the original question title, full Markdown question body, multiple answers (when available) with vote counts, original tags, creation date, and URL reference.
提供机构:
marin-community



