bobboyms/subset-Itau-Unibanco-aroeira-4B-tokens
收藏Hugging Face2025-04-24 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/bobboyms/subset-Itau-Unibanco-aroeira-4B-tokens
下载链接
链接失效反馈官方服务:
资源简介:
这是一个葡萄牙语PT-BR的子集语料库,名为Itau-Unibanco/aroeira,包含10亿个token。数据集包含文本和单词计数两个特征,适用于文本到文本生成和文本生成任务。数据集分为训练集,共有1100万个示例。整个数据集的大小为15.3GB,采用Apache-2.0许可证。
This is a Portuguese PT-BR subset corpus named Itau-Unibanco/aroeira, containing 1 billion tokens. The dataset includes two features: text and word count, and is suitable for text-to-text generation and text generation tasks. The dataset is split into a training set with a total of 11 million examples. The entire dataset size is 15.3GB, licensed under Apache-2.0.
提供机构:
bobboyms



