christopher/tokenizers
收藏Hugging Face2025-11-09 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/christopher/tokenizers
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个字段,如唯一标识符、创建时间戳、下载数、点赞数和一个标记化字符串等。数据集被分为训练集,其中包含了150,000个示例,整个训练集的大小为154,927,808,562字节。数据集的总下载量为85,406,627,457字节。这些信息表明该数据集可能用于机器学习训练,特别是涉及到数据流行度、用户互动等指标的分析。
The dataset includes various fields such as unique identifiers, creation timestamps, download counts, like counts, and a tokenizer string. The dataset is split into a training set, which contains 150,000 examples and is 154,927,808,562 bytes in size. The total download size of the dataset is 85,406,627,457 bytes. This suggests that the dataset may be used for machine learning training, particularly for analyses involving metrics such as data popularity and user engagement.
提供机构:
christopher



