collabo-research/Moxin-sft-reasoning-dataset-en-32kfiltered-chat-format-sorted
收藏Hugging Face2025-07-04 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/collabo-research/Moxin-sft-reasoning-dataset-en-32kfiltered-chat-format-sorted
下载链接
链接失效反馈官方服务:
资源简介:
Token Length Sorted Dataset是一个从collabo-research/Moxin-sft-reasoning-dataset-en-32kfiltered-chat-format数据集处理而来的数据集,按照token长度排序。包含196807个样本,每个样本的token长度在382到30004之间,平均token长度为7243.42。数据集包含原始数据集中的所有列,以及一个额外的token_length列,表示每个prompt的token数量。
Token Length Sorted Dataset is a dataset processed from collabo-research/Moxin-sft-reasoning-dataset-en-32kfiltered-chat-format, sorted by token length. It contains 196807 samples, with each samples token length ranging from 382 to 30004, and an average token length of 7243.42. The dataset includes all columns from the original dataset, as well as an additional token_length column representing the number of tokens in each prompt.
提供机构:
collabo-research



