ARPA: Armenian Paraphrase Detection Corpus
收藏arXiv2020-09-26 更新2024-06-21 收录
下载链接:
https://github.com/ivannikov-lab/arpa-paraphrase-corpus
下载链接
链接失效反馈官方服务:
资源简介:
ARPA数据集是由俄罗斯-亚美尼亚大学Ivannikov系统编程实验室创建的,旨在解决亚美尼亚语句子释义检测模型的训练和评估问题。该数据集通过反向翻译技术生成,原始句子从亚美尼亚语翻译成英语再翻译回来,经过两次迭代以增加多样性,然后由专家进行人工审核和标注。数据集包含2360个释义对,适用于自然语言处理中的抄袭检测和文本摘要等应用。
The ARPA dataset was developed by the Ivannikov Laboratory for System Programming at the Russian-Armenian University, aiming to address the training and evaluation needs of Armenian sentence paraphrase detection models. The dataset is generated via back-translation: original Armenian sentences are first translated into English and then back-translated into Armenian, with two iterations performed to enhance diversity, followed by manual review and annotation by domain experts. It contains 2360 paraphrase pairs, and is applicable to applications such as plagiarism detection and text summarization in natural language processing (NLP).
提供机构:
俄罗斯-亚美尼亚大学Ivannikov系统编程实验室
创建时间:
2020-09-26
搜集汇总
数据集介绍

背景与挑战
背景概述
ARPA数据集是一个用于亚美尼亚语句子释义检测的专业语料库,由俄罗斯-亚美尼亚大学实验室创建,通过反向翻译技术生成并经过人工审核,包含2360个释义对,适用于自然语言处理任务如抄袭检测和文本摘要。
以上内容由遇见数据集搜集并总结生成



