five

MASC Word Sense Sentence Corpus, tab-separated format

收藏
DataCite Commons2024-09-19 更新2025-04-09 收录
下载链接:
https://academiccommons.columbia.edu/doi/10.7916/D80V89XH
下载链接
链接失效反馈
官方服务:
资源简介:
Synopsis: The MASC Word Sense Sentence corpus is distributed as a set of three *tsv files (tab-separated format) that contain the sentences, annotation labels, and senses that comprise the sentence corpus: (1) the annotation labels (masc_annotations.tsv), (2) the WordNet word senses (masc_senses.tsv), and (3) the word token-sentence pairs, or instances (masc_sentences.tsv). A total of 116 distinct lemmas were selected; for each lemma, approximately 1000 example sentences were taken from the MASC corpus; and for each word in its sentence context, a trained annotator assigned a WordNet sense (WordNet version 3.1) as the annotation label. The following README describes the data in detail.
提供机构:
Columbia University
创建时间:
2014-06-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作