tools4eu/albanian-english-bundled
收藏Hugging Face2024-09-24 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/tools4eu/albanian-english-bundled
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个阿尔巴尼亚语-英语的双语数据集,通过标准化和捆绑多个现有数据集创建。数据集包含训练集,大小为521,626,304字节,包含3,192,496个样本。数据集主要用于翻译和填充掩码任务,支持阿尔巴尼亚语(sq)和英语(en)。
This dataset is an Albanian-English parallel corpus created by combining multiple sub-datasets, primarily for translation and fill-mask tasks. It includes parallel texts from various sources such as news, translation projects, TED talks, the Bible, and open subtitles. The dataset features texts in Albanian (sq) and English (en), along with identifiers for the dataset and subset. The training set contains 3,192,496 samples, with a dataset size of 521,626,304 bytes.
提供机构:
tools4eu



