sungyub/megascience-pairs
收藏Hugging Face2025-10-18 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/sungyub/megascience-pairs
下载链接
链接失效反馈官方服务:
资源简介:
MegaScience Pairs (非数学版)数据集包含了来自MegaScience数据集的基础问答对,去除了所有与数学相关的题目。这个处理过的版本将数据格式转换为适合训练和评估语言模型的简单输入输出格式。数据集涵盖的学科包括医学、生物学、物理学、化学、计算机科学和经济学,其中医学题目占比最高,约为36%。原始数据集大小约为235万个样本,而这个版本的数据集大小为829,186个样本,大约是原始数据集的35%,且仅包含非数学题目。
The MegaScience Pairs (Non-Math) dataset contains basic question-answer pairs from the MegaScience dataset, excluding all mathematics-related questions. This processed version reformats the data into a simple input-output format suitable for training and evaluating language models. The dataset covers subjects including Medicine, Biology, Physics, Chemistry, Computer Science, and Economics, with Medicine being the most prevalent subject at approximately 36%. The original dataset size is about 2.35 million samples, while this version includes 829,186 samples, which is about 35% of the original dataset, containing only non-math questions.
提供机构:
sungyub



