Probing Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/cxcscmu/Montessori-Instruct
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了1万个提示-指令-响应的三元组,用于探究模型的学习过程。此外,该数据集还用于评估合成数据对学生模型学习的影响。规模达到了1万条记录,任务是对语言模型进行数据合成训练。
This dataset consists of 10,000 prompt-instruction-response triplets, with a total of 10,000 records. It serves two main purposes: first, to investigate the learning process of models; second, to evaluate the impact of synthetic data on the learning of student models. Additionally, this dataset is designed for synthetic data training of language models.



