five

cjvt/slownet

收藏
Hugging Face2022-10-21 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cjvt/slownet
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - machine-generated - expert-generated language: - sl language_creators: - machine-generated - found license: - cc-by-sa-4.0 multilinguality: - monolingual pretty_name: Semantic lexicon of Slovene sloWNet size_categories: - 100K<n<1M source_datasets: [] tags: - slownet - wordnet - pwn task_categories: - other task_ids: [] --- # Dataset Card for SloWNet ### Dataset Summary sloWNet is the Slovene WordNet developed in the expand approach: it contains the complete Princeton WordNet 3.0 and over 70 000 Slovene literals. These literals have been added automatically using different types of existing resources, such as bilingual dictionaries, parallel corpora and Wikipedia. 33 000 literals have been subsequently hand-validated. For a detailed description of the data, please see the paper Fišer et al. (2012). ### Supported Tasks and Leaderboards Other (the data is a knowledge base). ### Languages Slovenian. ## Dataset Structure ### Data Instances Each synset is stored in its own instance. The following instance represents a synset containing the English synonyms `{'able'}` and Slovene synonyms `{'sposoben', 'zmožen'}`: ``` { 'id': 'eng-30-00001740-a', 'pos': 'a', 'bcs': 3, 'en_synonyms': { 'words': ['able'], 'senses': [1], 'pwnids': ['able%3:00:00::'] }, 'sl_synonyms': { 'words': ['sposoben', 'zmožen'], 'is_validated': [False, False] }, 'en_def': "(usually followed by `to') having the necessary means or skill or know-how or authority to do something", 'sl_def': 'N/A', 'en_usages': [ 'able to swim', 'she was able to program her computer', 'we were at last able to buy a car', 'able to get a grant for the project' ], 'sl_usages': [], 'ilrs': { 'types': ['near_antonym', 'be_in_state', 'be_in_state', 'eng_derivative', 'eng_derivative'], 'id_synsets': ['eng-30-00002098-a', 'eng-30-05200169-n', 'eng-30-05616246-n', 'eng-30-05200169-n', 'eng-30-05616246-n'] }, 'semeval07_cluster': 'able', 'domains': ['quality'] } ``` ### Data Fields - `id`: a string ID of the synset; - `pos`: part of speech tag of the synset; - `bcs`: Base Concept Set index (`-1` if not present); - `en_synonyms`: the English synonyms in the synset - synonym `i` is described with its form (`words[i]`), sense (`senses[i]`), and Princeton WordNet ID (`pwnids[i]`); - `sl_synonyms`: the Slovene synonyms in the synset - synonym `i` is described with its form (`words[i]`) and a flag marking if its correctness has been manually validated (`is_validated[i]`); - `en_def`: the English definition (`"N/A"` if not present); - `sl_def`: the Slovene definition (`"N/A"` if not present); - `en_usages`: the English examples of usage; - `sl_usages`: the Slovene examples of usage; - `ilrs`: internal language relations - relation `i` is described by its type (`types[i]`) and the target synset (`id_synsets[i]`); - `semeval07_cluster`: string cluster (`"N/A"` if not present); - `domains`: domains of the synset. ## Additional Information ### Dataset Curators Darja Fišer. ### Licensing Information CC BY-SA 4.0 ### Citation Information ``` @inproceedings{fiser2012slownet, title={sloWNet 3.0: development, extension and cleaning}, author={Fi{\v{s}}er, Darja and Novak, Jernej and Erjavec, Toma{\v{z}}}, booktitle={Proceedings of 6th International Global Wordnet Conference (GWC 2012)}, pages={113--117}, year={2012} } ``` ### Contributions Thanks to [@matejklemen](https://github.com/matejklemen) for adding this dataset.
提供机构:
cjvt
原始信息汇总

数据集卡片 for sloWNet

数据集概述

sloWNet 是斯洛文尼亚语 WordNet,采用扩展方法开发:它包含完整的普林斯顿 WordNet 3.0 和超过 70,000 个斯洛文尼亚语词条。这些词条是通过使用不同类型的现有资源(如双语词典、平行语料库和维基百科)自动添加的。随后,33,000 个词条经过了手工验证。

详细的数据描述请参见 Fišer 等人的论文(2012)。

支持的任务和排行榜

其他(数据是一个知识库)。

语言

斯洛文尼亚语。

数据集结构

数据实例

每个同义词集存储在其自己的实例中。以下实例表示一个包含英语同义词 {able} 和斯洛文尼亚语同义词 {sposoben, zmožen} 的同义词集:

json { id: eng-30-00001740-a, pos: a, bcs: 3, en_synonyms: { words: [able], senses: [1], pwnids: [able%3:00:00::] }, sl_synonyms: { words: [sposoben, zmožen], is_validated: [False, False] }, en_def: "(usually followed by `to) having the necessary means or skill or know-how or authority to do something", sl_def: N/A, en_usages: [ able to swim, she was able to program her computer, we were at last able to buy a car, able to get a grant for the project ], sl_usages: [], ilrs: { types: [near_antonym, be_in_state, be_in_state, eng_derivative, eng_derivative], id_synsets: [eng-30-00002098-a, eng-30-05200169-n, eng-30-05616246-n, eng-30-05200169-n, eng-30-05616246-n] }, semeval07_cluster: able, domains: [quality] }

数据字段

  • id: 同义词集的字符串 ID;
  • pos: 同义词集的词性标签;
  • bcs: 基础概念集索引(如果不存在则为 -1);
  • en_synonyms: 同义词集中的英语同义词 - 同义词 i 由其形式(words[i])、意义(senses[i])和普林斯顿 WordNet ID(pwnids[i])描述;
  • sl_synonyms: 同义词集中的斯洛文尼亚语同义词 - 同义词 i 由其形式(words[i])和标记其正确性是否经过手工验证的标志(is_validated[i])描述;
  • en_def: 英语定义(如果不存在则为 "N/A");
  • sl_def: 斯洛文尼亚语定义(如果不存在则为 "N/A");
  • en_usages: 英语用法示例;
  • sl_usages: 斯洛文尼亚语用法示例;
  • ilrs: 内部语言关系 - 关系 i 由其类型(types[i])和目标同义词集(id_synsets[i])描述;
  • semeval07_cluster: 字符串集群(如果不存在则为 "N/A");
  • domains: 同义词集的领域。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作