Acronym Corpus

Name: Acronym Corpus
Creator: HAL repository
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/rtotheich/acronym_corpus/tree/main

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是为了评估机器翻译系统在缩略语处理方面的表现而创建的，包含437个长格式-短格式（LF-SF）对。这些对是从一个包含13,500篇摘要的语料库中获取的，该语料库是从HAL数据库中抓取的。该数据集旨在提高机器翻译系统在缩略语解析方面的能力，并确保不包含任何冒犯性内容或个人信息。其规模为437个长格式-短格式对，任务重点是缩略语的消歧和翻译。

This dataset was created to evaluate the performance of machine translation systems in abbreviation handling. It includes 437 long-form-short-form (LF-SF) pairs, which were extracted from a corpus of 13,500 abstracts scraped from the HAL database. The dataset is designed to enhance the abbreviation parsing capabilities of machine translation systems, and guarantees that it contains no offensive content or personal identifiable information. With a total of 437 LF-SF pairs, the core task of this dataset focuses on abbreviation disambiguation and translation.

提供机构：

HAL repository

5,000+

优质数据集

54 个

任务类型

进入经典数据集