RoNLI
收藏arXiv2024-05-23 更新2024-06-21 收录
下载链接:
https://github.com/Eduard6421/RONLI
下载链接
链接失效反馈官方服务:
资源简介:
RoNLI是首个公开的罗马尼亚自然语言推理数据集,由布加勒斯特大学创建。该数据集包含64000对句子,其中58000对用于训练,3000对用于验证,3000对用于测试。数据集的句子对来源于罗马尼亚语维基百科,通过特定的链接短语自动标注,而验证和测试集则通过人工标注确保准确性。RoNLI数据集的创建旨在解决罗马尼亚语等低资源语言在自然语言理解领域的研究空白,为机器学习模型提供丰富的训练和评估资源。
RoNLI is the first publicly accessible Romanian natural language inference (NLI) dataset developed by the University of Bucharest. This dataset consists of 64,000 sentence pairs, with 58,000 allocated for training, 3,000 for validation, and the remaining 3,000 for testing. The sentence pairs are derived from Romanian Wikipedia, and automatically annotated using specific linking phrases. Meanwhile, the validation and test sets are manually annotated to ensure annotation accuracy. The creation of the RoNLI dataset aims to address the research gap in natural language understanding for low-resource languages such as Romanian, and provide abundant training and evaluation resources for machine learning models.
提供机构:
布加勒斯特大学
创建时间:
2024-05-20



