bismarck91/enA-frA-xc-tokenized-combined-fr-en
收藏Hugging Face2025-10-09 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/bismarck91/enA-frA-xc-tokenized-combined-fr-en
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含输入ID序列、注意力掩码序列和标签序列。输入ID可能代表文本数据中的单词或子词的索引,注意力掩码用于指示序列中哪些内容应该被模型关注,标签可能表示分类或回归任务的输出。数据集被划分为训练集,共有超过500万样本,数据大小约为33GB。
The dataset includes sequences of input IDs, attention masks, and labels. Input IDs likely represent indices of words or subwords in the text data, attention masks indicate which parts of the sequence the model should focus on, and labels may represent the output for classification or regression tasks. The dataset is split into a training set with over 5 million examples, totaling approximately 33GB in size.
提供机构:
bismarck91



