lopentu/Chinese-Wordnet-SemCor
收藏Hugging Face2025-05-05 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/lopentu/Chinese-Wordnet-SemCor
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于中文词语词义消歧任务的数据集,特别关注于中文词网(CWN)2.0中具有超过10个意义的“困难”词语。数据集包含训练集和测试集,每个数据实例包含目标词语、词性、正确和候选的意义ID及其定义和例句,以及一个标签指示候选意义是否正确。数据来源于中文词网和 Academia Sinica Balanced Corpus,并由六位具有语言学背景的母语汉语使用者进行手动标注。
This dataset is designed for the task of Word Sense Disambiguation (WSD) for Chinese words, particularly focusing on words with more than 10 senses identified as difficult in Chinese Wordnet (CWN) 2.0. It includes both training and test sets, where each instance contains the target word, its part-of-speech, correct and candidate sense IDs with their definitions and example sentences, and a label indicating whether the candidate sense is correct. The data is sourced from Chinese Wordnet and the Academia Sinica Balanced Corpus, and was manually annotated by six native Mandarin speakers with linguistic backgrounds.
提供机构:
lopentu



