BAAI-2k
收藏arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/BAAI/Infinity-Instruct/tree/main/Gen
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为BAAI-2k,是从BAAI-Infinity-Instruct数据集中提取的2000个样本组成,专注于筛选出高回报样本,以确保数据的质量和类别的多样性。此外,该数据集还确保了各类别之间的均匀采样,并已用于调整包括Qwen 3和Llama 3系列在内的多种大型语言模型(LLMs)。同时,该数据集在多个下游基准测试中进行了评估。规模上,该数据集包含2000个样本,其任务是针对大型语言模型(LLMs)进行微调。
This dataset, named BAAI-2k, consists of 2000 samples extracted from the BAAI-Infinity-Instruct dataset. It is developed by screening high-reward samples to ensure both data quality and category diversity. Additionally, this dataset ensures uniform sampling across all categories, and has been utilized to fine-tune multiple large language models (LLMs) including the Qwen 3 and Llama 3 series. Meanwhile, this dataset has been evaluated on several downstream benchmark datasets. With a total of 2000 samples, this dataset is specifically designed for fine-tuning large language models (LLMs).
提供机构:
BAAI



