five

BAAI-2k

收藏
arXiv2025-09-30 收录
下载链接:
https://huggingface.co/datasets/BAAI/Infinity-Instruct/tree/main/Gen
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为BAAI-2k,是从BAAI-Infinity-Instruct数据集中提取的2000个样本组成,专注于筛选出高回报样本,以确保数据的质量和类别的多样性。此外,该数据集还确保了各类别之间的均匀采样,并已用于调整包括Qwen 3和Llama 3系列在内的多种大型语言模型(LLMs)。同时,该数据集在多个下游基准测试中进行了评估。规模上,该数据集包含2000个样本,其任务是针对大型语言模型(LLMs)进行微调。

This dataset, named BAAI-2k, consists of 2000 samples extracted from the BAAI-Infinity-Instruct dataset. It is developed by screening high-reward samples to ensure both data quality and category diversity. Additionally, this dataset ensures uniform sampling across all categories, and has been utilized to fine-tune multiple large language models (LLMs) including the Qwen 3 and Llama 3 series. Meanwhile, this dataset has been evaluated on several downstream benchmark datasets. With a total of 2000 samples, this dataset is specifically designed for fine-tuning large language models (LLMs).
提供机构:
BAAI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作