aeolian83/HuggingFaceTB_smoltalk_filtered_10k_sampled
收藏Hugging Face2025-04-09 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/aeolian83/HuggingFaceTB_smoltalk_filtered_10k_sampled
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于Merge-Up SLM训练的对话数据集,包含消息内容、角色、来源、字符数等信息。数据集经过筛选,仅包含全英文的样本,并根据字符长度进行比例采样,最终形成了包含100,000个样本的数据集。
This is a dialogue dataset for Merge-Up SLM training, containing information such as message content, role, source, token count, etc. The dataset has been filtered to include only samples with English alphabets exclusively and has been sampled proportionally based on token length to form a dataset of 100,000 samples.
提供机构:
aeolian83



