Dhiraj45/SFT-Tokenized
收藏Hugging Face2025-10-30 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/Dhiraj45/SFT-Tokenized
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个特性字段:input_ids和attention_mask,其中input_ids为整数列表,attention_mask为8位整数列表。数据集仅包含训练集,共有77685个示例,数据集大小为507452655字节。数据集需要使用HuggingFaceTB/SmolLM2-360M-Instruct的Tokenizer进行预处理。
The dataset includes two feature fields: input_ids and attention_mask, where input_ids are lists of integers, and attention_mask are lists of 8-bit integers. The dataset consists only of a training set with a total of 77,685 examples, and the dataset size is 507,452,655 bytes. The dataset requires preprocessing with the HuggingFaceTB/SmolLM2-360M-Instruct tokenizer.
提供机构:
Dhiraj45



