Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison

Name: Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison
Creator: OpenDataLab
Published: 2026-05-24 07:30:10
License: 暂无描述

OpenDataLab2026-05-24 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/Word_Sense_Disambiguation_a_etc

下载链接

链接失效反馈

官方服务：

资源简介：

Raganato 等人的评估框架。 2017 年包括两个训练集（SemCor-Miller 等，1993-和 OMSTI-Taghipour 和 Ng，2015-）和 Senseval/SemEval 系列的五个测试集（Edmonds 和 Cotton，2001；Snyder 和 Palmer，2004；Pradhan 等al., 2007; Navigli et al., 2013; Moro and Navigli, 2015)，标准化为相同的格式和感觉库存（即 WordNet 3.0）。通常，WSD 有两种方法：有监督的（利用语义注释的训练数据）和基于知识的（利用词汇资源的属性）。监督：使用最广泛的训练语料库是 SemCor，手动注释了来自 352 个文档的 226,036 个语义注释。评估表中的所有受监督系统都在 SemCor 上进行了训练。一些监督方法，尤其是神经架构，通常使用 SemEval 2007 数据集作为开发集（用 * 标记）。最常见的基线是最常见的意义（MFS）启发式，它为每个目标词选择训练数据中最常见的意义。基于知识的：基于知识的系统通常利用 WordNet 或 BabelNet 作为语义网络。基础意义清单（即 WordNet 3.0）给出的第一个意义作为基线包含在内。 NLP 进度的描述

The evaluation framework proposed by Raganato et al. (2017) includes two training sets (SemCor (Miller et al., 1993) and OMSTI (Taghipour and Ng, 2015)) and five test sets from the Senseval/SemEval series (Edmonds and Cotton, 2001; Snyder and Palmer, 2004; Pradhan et al., 2007; Navigli et al., 2013; Moro and Navigli, 2015), all standardized to the same format and sense inventory (i.e., WordNet 3.0). Generally, there are two primary approaches to Word Sense Disambiguation (WSD): supervised methods, which utilize semantically annotated training data, and knowledge-based methods, which leverage the properties of lexical resources. Supervised methods: The most widely used training corpus is SemCor, which contains 226,036 manually annotated semantic annotations from 352 documents. All supervised systems in the evaluation table are trained on SemCor. Some supervised methods, especially neural architectures, often use the SemEval 2007 dataset as the development set (marked with *). The most common baseline is the Most Frequent Sense (MFS) heuristic, which selects the most frequent sense of each target word from the training data. Knowledge-based methods: Knowledge-based systems typically utilize WordNet or BabelNet as semantic networks. The first sense specified in the base sense inventory (i.e., WordNet 3.0) is included as a baseline. Description of NLP progress

提供机构：

OpenDataLab

创建时间：

2022-05-23

搜集汇总

数据集介绍