GLADIS
收藏arXiv2023-03-14 更新2024-06-21 收录
下载链接:
https://github.com/tigerchen52/GLADIS
下载链接
链接失效反馈官方服务:
资源简介:
GLADIS是由法国巴黎高等电信学院与巴黎理工学院联合研究所创建的一个大型缩略语消歧基准数据集。该数据集包含154万缩略语和638万全称,覆盖通用、科学和生物医学领域。数据集通过规则基础算法从包括Web页面、书籍、科学和生物医学论文、法律文件等广泛领域的文本中提取。GLADIS旨在解决自然语言处理中缩略语消歧的挑战,特别是在信息提取、机器翻译和搜索引擎查询分析等下游任务中。
GLADIS is a large-scale abbreviation disambiguation benchmark dataset developed by the Joint Research Institute of Télécom Paris and Institut Polytechnique de Paris. It contains 1.54 million abbreviations and 6.38 million full forms, covering general, scientific, and biomedical domains. The dataset is extracted from texts across a wide range of sources including web pages, books, scientific and biomedical papers, and legal documents via rule-based algorithms. GLADIS aims to address the challenges of abbreviation disambiguation in natural language processing, especially in downstream tasks such as information extraction, machine translation, and search engine query analysis.
提供机构:
法国巴黎高等电信学院与巴黎理工学院联合研究所
创建时间:
2023-02-04



