HuggingFaceTB/smoltalk2
收藏Hugging Face2025-10-31 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/HuggingFaceTB/smoltalk2
下载链接
链接失效反馈官方服务:
资源简介:
SmolTalk2数据集包含三个子集(Mid、SFT、Preference),对应于SmolLM3-3B的三个后训练阶段。每个子集都包含了不同数量的数据集,并经过了去重和清洗处理。Mid训练集包含2个数据集,SFT包含25个数据集,Preference包含2个数据集。数据集的详细信息和统计数据可以在README文件中找到。
The SmolTalk2 dataset includes three subsets (Mid, SFT, Preference) corresponding to the three phases of Post-Training for SmolLM3-3B. Each subset contains a different number of datasets that have been decontaminated and cleaned. The Mid-training set includes 2 datasets, the SFT set includes 25 datasets, and the Preference set includes 2 datasets. Detailed information and statistics about the datasets can be found in the README file.
提供机构:
HuggingFaceTB



