XLEnt

Name: XLEnt
Creator: 约翰斯·霍普金斯大学
Published: 2021-09-11 00:21:22
License: 暂无描述

arXiv2021-09-11 更新2024-06-21 收录

下载链接：

http://data.statmt.org/xlent/

下载链接

链接失效反馈

官方服务：

资源简介：

XLEnt数据集由约翰斯·霍普金斯大学等机构创建，包含1.64亿对跨语言实体，覆盖120种语言与英语的对应关系。该数据集通过自动挖掘网络数据，利用词义音位对齐技术LSP-Align构建，旨在改善低资源语言的跨语言实体识别。XLEnt不仅支持机器翻译和跨语言信息链接，还为多语言自然语言处理任务提供重要资源，有助于推动全球语言技术的进步。

The XLEnt dataset was created by Johns Hopkins University and other institutions. It contains 164 million cross-lingual entity pairs, covering cross-lingual correspondences between English and 120 languages. This dataset was constructed by automatically mining web data and leveraging the lexical-phonetic alignment technique LSP-Align, aiming to improve cross-lingual entity recognition for low-resource languages. Besides supporting machine translation and cross-lingual information linking, XLEnt also serves as a critical resource for multilingual natural language processing tasks, contributing to the advancement of global language technology.

提供机构：

约翰斯·霍普金斯大学

创建时间：

2021-04-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集