PLOD
收藏arXiv2022-04-29 更新2024-06-21 收录
下载链接:
https://github.com/lzilio/PLOD
下载链接
链接失效反馈官方服务:
资源简介:
PLOD数据集是由萨里大学翻译研究中心创建的一个大规模科学文档缩写检测数据集。该数据集包含超过160,000个自动标注的缩写及其完整形式,主要来源于PLOS期刊的开放获取文章。数据集的创建过程包括自动收集、手动验证和自动验证,确保了数据的质量和准确性。PLOD数据集的应用领域广泛,主要用于自然语言处理任务,如机器翻译和信息检索,旨在提高这些系统对缩写的处理能力,从而提升整体性能。
The PLOD dataset is a large-scale scientific document abbreviation detection dataset developed by the Centre for Translation Studies at the University of Surrey. It contains over 160,000 automatically annotated abbreviations and their corresponding full forms, which are primarily sourced from open-access articles published in PLOS journals. The dataset construction process involves automatic collection, manual verification and automatic verification, thus ensuring the quality and accuracy of the data. The PLOD dataset has a wide range of application scenarios, mainly applied to natural language processing tasks such as machine translation and information retrieval. Its purpose is to improve the abbreviation processing capabilities of relevant systems, thereby enhancing their overall performance.
提供机构:
萨里大学翻译研究中心
创建时间:
2022-04-26



