liyucheng/UFSAC
收藏Hugging Face2023-01-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/liyucheng/UFSAC
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-2.0
task_categories:
- token-classification
language:
- en
size_categories:
- 1M<n<10M
---
# Dataset Card for Dataset Name
UFSAC: Unification of Sense Annotated Corpora and Tools
## Dataset Description
- **Homepage:** https://github.com/getalp/UFSAC
- **Repository:** https://github.com/getalp/UFSAC
- **Paper:**
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
### Supported Tasks and Leaderboards
WSD: Word Sense Disambiguation
### Languages
English
## Dataset Structure
### Data Instances
```
{'lemmas': ['_',
'be',
'quite',
'_',
'hefty',
'spade',
'_',
'_',
'bicycle',
'_',
'type',
'handlebar',
'_',
'_',
'spring',
'lever',
'_',
'_',
'rear',
'_',
'_',
'_',
'step',
'on',
'_',
'activate',
'_',
'_'],
'pos_tags': ['PRP',
'VBZ',
'RB',
'DT',
'JJ',
'NN',
',',
'IN',
'NN',
':',
'NN',
'NNS',
'CC',
'DT',
'VBN',
'NN',
'IN',
'DT',
'NN',
',',
'WDT',
'PRP',
'VBP',
'RP',
'TO',
'VB',
'PRP',
'.'],
'sense_keys': ['activate%2:36:00::'],
'target_idx': 25,
'tokens': ['It',
'is',
'quite',
'a',
'hefty',
'spade',
',',
'with',
'bicycle',
'-',
'type',
'handlebars',
'and',
'a',
'sprung',
'lever',
'at',
'the',
'rear',
',',
'which',
'you',
'step',
'on',
'to',
'activate',
'it',
'.']}
```
### Data Fields
```
{'tokens': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
'lemmas': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
'pos_tags': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None),
'target_idx': Value(dtype='int32', id=None),
'sense_keys': Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)}
```
### Data Splits
Not split. Use `train` split directly.
提供机构:
liyucheng
原始信息汇总
数据集概述
数据集基本信息
- 名称: UFSAC: Unification of Sense Annotated Corpora and Tools
- 语言: 英语
- 大小: 1M<n<10M
- 许可证: cc-by-2.0
数据集描述
数据集总结
- 任务类别: 词性标注
- 支持的任务和排行榜: 词义消歧 (Word Sense Disambiguation)
数据集结构
数据实例
- 包含字段:
tokens,lemmas,pos_tags,target_idx,sense_keys - 示例数据展示了文本中的词、词性、目标索引和词义键。
数据字段
tokens: 序列,字符串类型lemmas: 序列,字符串类型pos_tags: 序列,字符串类型target_idx: 整数类型sense_keys: 序列,字符串类型
数据分割
- 未进行分割,直接使用
train分割。



