cxrbon16/tokenized-news-x
收藏Hugging Face2025-02-28 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/cxrbon16/tokenized-news-x
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本处理相关的特征,如输入ID序列、注意力掩码等,适用于机器学习模型的训练。数据集分为训练集,大小约为193.5GB,包含约1.79亿条示例。但是具体的数据集内容和用途在README中没有明确描述。
The dataset includes text processing related features such as input ID sequences, attention masks, etc., and is suitable for training machine learning models. The dataset is split into a training set, which is approximately 193.5GB in size and contains about 179 million examples. However, the specific content and purpose of the dataset are not explicitly described in the README.
提供机构:
cxrbon16



