five

Word Sense Disambiguation: a Unified Evaluation Framework and Empirical Comparison

收藏
OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Word_Sense_Disambiguation_a_etc
下载链接
链接失效反馈
官方服务:
资源简介:
Raganato 等人的评估框架。 2017 年包括两个训练集(SemCor-Miller 等,1993-和 OMSTI-Taghipour 和 Ng,2015-)和 Senseval/SemEval 系列的五个测试集(Edmonds 和 Cotton,2001;Snyder 和 Palmer,2004;Pradhan 等al., 2007; Navigli et al., 2013; Moro and Navigli, 2015),标准化为相同的格式和感觉库存(即 WordNet 3.0)。通常,WSD 有两种方法:有监督的(利用语义注释的训练数据)和基于知识的(利用词汇资源的属性)。监督:使用最广泛的训练语料库是 SemCor,手动注释了来自 352 个文档的 226,036 个语义注释。评估表中的所有受监督系统都在 SemCor 上进行了训练。一些监督方法,尤其是神经架构,通常使用 SemEval 2007 数据集作为开发集(用 * 标记)。最常见的基线是最常见的意义(MFS)启发式,它为每个目标词选择训练数据中最常见的意义。基于知识的:基于知识的系统通常利用 WordNet 或 BabelNet 作为语义网络。基础意义清单(即 WordNet 3.0)给出的第一个意义作为基线包含在内。 NLP 进度的描述

The evaluation framework proposed by Raganato et al. (2017) includes two training sets (SemCor (Miller et al., 1993) and OMSTI (Taghipour and Ng, 2015)) and five test sets from the Senseval/SemEval series (Edmonds and Cotton, 2001; Snyder and Palmer, 2004; Pradhan et al., 2007; Navigli et al., 2013; Moro and Navigli, 2015), all standardized to the same format and sense inventory (i.e., WordNet 3.0). Generally, there are two primary approaches to Word Sense Disambiguation (WSD): supervised methods, which utilize semantically annotated training data, and knowledge-based methods, which leverage the properties of lexical resources. Supervised methods: The most widely used training corpus is SemCor, which contains 226,036 manually annotated semantic annotations from 352 documents. All supervised systems in the evaluation table are trained on SemCor. Some supervised methods, especially neural architectures, often use the SemEval 2007 dataset as the development set (marked with *). The most common baseline is the Most Frequent Sense (MFS) heuristic, which selects the most frequent sense of each target word from the training data. Knowledge-based methods: Knowledge-based systems typically utilize WordNet or BabelNet as semantic networks. The first sense specified in the base sense inventory (i.e., WordNet 3.0) is included as a baseline. Description of NLP progress
提供机构:
OpenDataLab
创建时间:
2022-05-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集基于Raganato等人(2017年)提出的词义消歧统一评估框架,包含两个训练集和五个测试集,均标准化为WordNet 3.0格式。它支持监督和基于知识的词义消歧方法,其中监督方法常用SemCor进行训练,而基于知识的系统则利用WordNet等语义网络。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作