444
收藏魔搭社区2025-11-12 更新2025-08-16 收录
下载链接:
https://modelscope.cn/datasets/gaoyang1122/444
下载链接
链接失效反馈官方服务:
资源简介:
# Qwen3 微调数据集
本仓库包含用于Qwen3模型微调的高质量数据集。这些数据集经过精心选择,适用于增强大语言模型在各种任务上的性能,特别是推理和知识密集型任务。
## 数据集列表
### 1. FineTome-100k
一个高质量的指令遵循数据集,专为大语言模型微调设计。
- **来源**: [mlabonne/FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k)
- **样本数量**: 100,000
- **格式**: 包含对话格式的训练数据,每个样本包含对话内容、来源和质量分数
- **特点**: 数据质量高,覆盖广泛的指令类型和领域
### 2. OpenMathReasoning-mini
专注于数学推理能力的微调数据集,包含各种数学问题及其详细解答过程。
- **来源**: [unsloth/OpenMathReasoning-mini](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini)
- **样本数量**: 19,252
- **格式**: 包含数学问题、期望答案、问题类型、解答过程等字段
- **特点**: 增强模型的数学推理和思维链(Chain-of-Thought)能力
## 使用方法
这些数据集适用于Qwen3系列模型的监督微调(SFT),可以通过以下方式使用:
```python
from datasets import load_from_disk
# 加载FineTome数据集
finetome_dataset = load_from_disk("FineTome")
# 加载OpenMathReasoning数据集
math_dataset = load_from_disk("OpenMathReasoning")
```
## 许可说明
请注意,使用这些数据集时需遵循原始数据集的许可条款。在商业应用前,请确认相应的使用权限。
# Qwen3 Fine-tuning Datasets
This repository contains high-quality datasets for fine-tuning the Qwen3 series models. These datasets have been carefully selected to enhance the performance of large language models across various tasks, especially reasoning and knowledge-intensive tasks.
## Dataset List
### 1. FineTome-100k
A high-quality instruction-following dataset designed specifically for fine-tuning large language models.
- **Source**: [mlabonne/FineTome-100k](https://huggingface.co/datasets/mlabonne/FineTome-100k)
- **Sample Count**: 100,000
- **Format**: Contains conversational training data, with each sample including conversation content, source, and quality score
- **Features**: High data quality, covering a wide range of instruction types and domains
### 2. OpenMathReasoning-mini
A fine-tuning dataset focused on mathematical reasoning capabilities, containing various math problems and their detailed solution processes.
- **Source**: [unsloth/OpenMathReasoning-mini](https://huggingface.co/datasets/unsloth/OpenMathReasoning-mini)
- **Sample Count**: 19,252
- **Format**: Includes fields such as mathematical problems, expected answers, problem types, and solution processes
- **Features**: Enhances the model's mathematical reasoning and Chain-of-Thought capabilities
## Usage Instructions
These datasets are suitable for supervised fine-tuning (SFT) of the Qwen3 series models, and can be used via the following approach:
python
from datasets import load_from_disk
# Load the FineTome dataset
finetome_dataset = load_from_disk("FineTome")
# Load the OpenMathReasoning dataset
math_dataset = load_from_disk("OpenMathReasoning")
## License Notice
Please note that the use of these datasets must comply with the license terms of the original datasets. Please confirm the corresponding usage rights prior to commercial application.
提供机构:
maas
创建时间:
2025-08-11



