chenrm/h-corpus-tokenized-qwen2.5-ctx2048
收藏Hugging Face2025-02-12 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/chenrm/h-corpus-tokenized-qwen2.5-ctx2048
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含input_ids和attention_mask两个特征字段的数据集,主要用于训练机器学习模型。数据集被划分为训练集,大小为21758092440字节,共有2123155个样本。数据集配置中包含了默认配置,指定了训练集的数据文件路径。
This dataset includes two feature fields, input_ids and attention_mask, primarily used for training machine learning models. The dataset is split into a training set, which is 21758092440 bytes in size and contains 2123155 samples. The dataset configuration includes a default setting that specifies the data file path for the training set.
提供机构:
chenrm



