FrancophonIA/JRC-Names
收藏Hugging Face2025-03-30 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/FrancophonIA/JRC-Names
下载链接
链接失效反馈官方服务:
资源简介:
JRC-Names是一个高度多语言的命名实体资源,包含人名和组织的名称(称为实体)及其许多拼写变体(一个实体的拼写变体可达数百种),包括跨脚本(拉丁文、希腊文、阿拉伯文、西里尔文、日文、中文等)。总共有549,189个实体(名称及其在多种语言中的拼写变体)。该命名实体资源文件附带了一个用Java实现的演示软件,可以生成任何输入名称的已知拼写变体列表,并分析UTF8编码的文本文件以查找已知的实体提及,返回找到的名称变体、该实体的首选显示名称、该名称的唯一标识符、实体名称在文本中的位置及其字符长度。
JRC-Names is a highly multilingual named entity resource for person and organisation names (called entities) and their many spelling variants (up to hundreds for a single person), including across scripts (Latin, Greek, Arabic, Cyrillic, Japanese, Chinese, etc.). In total there are 549,189 entities (names and their spelling variants in multiple languages). The named entity resource file is accompanied by Java-implemented demonstrator software that (a) generates a list of known spelling variants for any input name, and (b) analyses UTF8-encoded text files to find known entity mentions, returning the name variant found, the preferred display name for that entity, the unique name identifier for that name, the position of the entity name in the text, and its length in characters.
提供机构:
FrancophonIA



