ReliableAI/Irish-English-Parallel-Collection
收藏Hugging Face2024-07-16 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/ReliableAI/Irish-English-Parallel-Collection
下载链接
链接失效反馈官方服务:
资源简介:
该平行英语-爱尔兰语文本数据集包含来自多个来源的数据,如paracrawl.eu和ECLR。该数据集被用于英语中心预训练的大型语言模型(LLM)的持续预训练,以便模型在学习单一爱尔兰语数据之前,能够更容易地建立两种语言之间的联系。数据集的结构包括文本、英语文本和ID字段。
This parallel English-Irish text dataset includes data from various sources such as paracrawl.eu and ELRC-SHARE. The dataset is used for continual pre-training of English-centric large language models, aiming to allow the models to draw connections between the two languages more easily before learning on mono Irish data. The dataset comprises various types of documents including web content, reference documents, reports, and press releases, totaling over 60,000 translation units.
提供机构:
ReliableAI



