ARPA: Armenian Paraphrase Detection Corpus

Name: ARPA: Armenian Paraphrase Detection Corpus
Creator: 俄罗斯-亚美尼亚大学Ivannikov系统编程实验室
Published: 2020-09-26 22:56:57
License: 暂无描述

arXiv2020-09-26 更新2024-06-21 收录

下载链接：

https://github.com/ivannikov-lab/arpa-paraphrase-corpus

下载链接

链接失效反馈

官方服务：

资源简介：

ARPA数据集是由俄罗斯-亚美尼亚大学Ivannikov系统编程实验室创建的，旨在解决亚美尼亚语句子释义检测模型的训练和评估问题。该数据集通过反向翻译技术生成，原始句子从亚美尼亚语翻译成英语再翻译回来，经过两次迭代以增加多样性，然后由专家进行人工审核和标注。数据集包含2360个释义对，适用于自然语言处理中的抄袭检测和文本摘要等应用。

The ARPA dataset was developed by the Ivannikov Laboratory for System Programming at the Russian-Armenian University, aiming to address the training and evaluation needs of Armenian sentence paraphrase detection models. The dataset is generated via back-translation: original Armenian sentences are first translated into English and then back-translated into Armenian, with two iterations performed to enhance diversity, followed by manual review and annotation by domain experts. It contains 2360 paraphrase pairs, and is applicable to applications such as plagiarism detection and text summarization in natural language processing (NLP).

提供机构：

俄罗斯-亚美尼亚大学Ivannikov系统编程实验室

创建时间：

2020-09-26

搜集汇总

数据集介绍