Babelscape/cner
收藏Hugging Face2024-06-17 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/Babelscape/cner
下载链接
链接失效反馈官方服务:
资源简介:
概念和命名实体识别(CNER)是一项新颖的任务,旨在联合处理概念和命名实体的识别和分类。该数据集包含多个字段,如`tokens`、`pos`、`c_vs_ne`、`cner_tags`和`cner_tags_ids`,并提供了完整的标签集及其对应的索引。数据集的使用受限于非商业研究目的,遵循CC BY-NC-SA 4.0许可证。
概念和命名实体识别(CNER)是一项新颖的任务,旨在联合处理概念和命名实体的识别和分类。该数据集包含多个字段,如`tokens`、`pos`、`c_vs_ne`、`cner_tags`和`cner_tags_ids`,并提供了完整的标签集及其对应的索引。数据集的使用受限于非商业研究目的,遵循CC BY-NC-SA 4.0许可证。
提供机构:
Babelscape
原始信息汇总
数据集概述
基本信息
- 数据集名称: cner-dataset
- 任务类别: token-classification
- 任务ID: named-entity-recognition
- 语言: en
- 许可证: cc-by-nc-sa-4.0
- 数据来源: original
- 标签: structure-prediction
- 数据集大小: 100K<n<1M
- 注释创建者: machine-generated, human-generated
数据集描述
- 摘要: Concept and Named Entity Recognition (CNER) 是一个新颖的任务,它联合处理概念和命名实体的识别和分类。
数据集结构
- 数据字段:
tokens: 一个string特征的list。pos: 一个string特征的list(词性标签)。c_vs_ne: 一个string特征的list,标识一个词元是概念还是命名实体。cner_tags: 一个 cner 分类标签的list(str)。cner_tags_ids: 一个 cner 分类标签 ID 的list(int)。完整标签集及其索引如下: python { "O": 0, "B-ANIMAL": 1, "I-ANIMAL": 2, "B-DISEASE": 3, "I-DISEASE": 4, "B-DISCIPLINE": 5, "I-DISCIPLINE": 6, "B-LANGUAGE": 7, "I-LANGUAGE": 8, "B-EVENT": 9, "I-EVENT": 10, "B-FOOD": 11, "I-FOOD": 12, "B-ARTIFACT": 13, "I-ARTIFACT": 14, "B-MEDIA": 15, "I-MEDIA": 16, "B-GROUP": 17, "I-GROUP": 18, "B-ORG": 19, "I-ORG": 20, "B-PER": 21, "I-PER": 22, "B-STRUCT": 23, "I-STRUCT": 24, "B-LOC": 25, "I-LOC": 26, "B-PLANT": 27, "I-PLANT": 28, "B-MONEY": 29, "I-MONEY": 30, "B-BIOLOGY": 31, "I-BIOLOGY": 32, "B-MEASURE": 33, "I-MEASURE": 34, "B-SUPER": 35, "I-SUPER": 36, "B-CELESTIAL": 37, "I-CELESTIAL": 38, "B-LAW": 39, "I-LAW": 40, "B-SUBSTANCE": 41, "I-SUBSTANCE": 42, "B-PART": 43, "I-PART": 44, "B-CULTURE": 45, "I-CULTURE": 46, "B-PROPERTY": 47, "I-PROPERTY": 48, "B-FEELING": 49, "I-FEELING": 50, "B-PSYCH": 51, "I-PSYCH": 52, "B-RELATION": 53, "I-RELATION": 54, "B-DATETIME": 55, "I-DATETIME": 56, "B-ASSET": 57, "I-ASSET": 58 }
附加信息
- 许可证信息: 该数据集内容仅限于非商业研究用途,遵循 Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)。数据集内容的版权属于 Babelscape。
- 引用信息: bibtex @inproceedings{martinelli-etal-2024-cner, title = "{CNER}: Concept and Named Entity Recognition", author = "Martinelli, Giuliano and Molfese, Francesco and Tedeschi, Simone and Fern{a}ndez-Castro, Alberte and Navigli, Roberto", editor = "Duh, Kevin and Gomez, Helena and Bethard, Steven", booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)", month = jun, year = "2024", address = "Mexico City, Mexico", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.naacl-long.461", pages = "8329--8344" }



