five

tner/fin

收藏
Hugging Face2022-08-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tner/fin
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: - mit multilinguality: - monolingual size_categories: - 1K<n<10K task_categories: - token-classification task_ids: - named-entity-recognition pretty_name: FIN --- # Dataset Card for "tner/fin" ## Dataset Description - **Repository:** [T-NER](https://github.com/asahi417/tner) - **Paper:** [https://aclanthology.org/U15-1010.pdf](https://aclanthology.org/U15-1010.pdf) - **Dataset:** FIN - **Domain:** Financial News - **Number of Entity:** 4 ### Dataset Summary FIN NER dataset formatted in a part of [TNER](https://github.com/asahi417/tner) project. FIN dataset contains training (FIN5) and test (FIN3) only, so we randomly sample a half size of test instances from the training set to create validation set. - Entity Types: `ORG`, `LOC`, `PER`, `MISC` ## Dataset Structure ### Data Instances An example of `train` looks as follows. ``` { "tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"] } ``` ### Label ID The label2id dictionary can be found at [here](https://huggingface.co/datasets/tner/fin/raw/main/dataset/label.json). ```python { "O": 0, "B-PER": 1, "B-LOC": 2, "B-ORG": 3, "B-MISC": 4, "I-PER": 5, "I-LOC": 6, "I-ORG": 7, "I-MISC": 8 } ``` ### Data Splits | name |train|validation|test| |---------|----:|---------:|---:| |fin |1014 | 303| 150| ### Citation Information ``` @inproceedings{salinas-alvarado-etal-2015-domain, title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment", author = "Salinas Alvarado, Julio Cesar and Verspoor, Karin and Baldwin, Timothy", booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015", month = dec, year = "2015", address = "Parramatta, Australia", url = "https://aclanthology.org/U15-1010", pages = "84--90", } ```

--- 语言: - 英语 许可证: - MIT协议 多语言属性: - 单语言 样本量范围: - 1000 < 样本数 < 10000 任务类别: - Token分类(Token Classification) 任务子类型: - 命名实体识别(Named Entity Recognition,NER) 简称:FIN --- # "tner/fin" 数据集卡片 ## 数据集说明 - **代码仓库**:[T-NER](https://github.com/asahi417/tner) - **相关论文**:[https://aclanthology.org/U15-1010.pdf](https://aclanthology.org/U15-1010.pdf) - **数据集名称**:FIN - **应用领域**:金融新闻 - **实体类别数**:4 ### 数据集概览 FIN命名实体识别(NER)数据集是[T-NER](https://github.com/asahi417/tner)项目的一部分,采用该项目的标准格式整理。原始FIN数据集仅包含训练集(FIN5)与测试集(FIN3),因此我们从训练集中随机采样半数样本作为验证集。 - **实体类型**:组织(Organization,ORG)、地点(Location,LOC)、人物(Person,PER)、其他实体(Miscellaneous,MISC) ## 数据集结构 ### 数据样例 训练集的一条样例格式如下: json { "tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"] } ### 标签映射 标签与ID的对应字典可参见[此处](https://huggingface.co/datasets/tner/fin/raw/main/dataset/label.json): python { "O": 0, "B-PER": 1, "B-LOC": 2, "B-ORG": 3, "B-MISC": 4, "I-PER": 5, "I-LOC": 6, "I-ORG": 7, "I-MISC": 8 } 其中,`O`表示非实体标记,`B-*`表示对应实体的起始位置,`I-*`表示对应实体的内部位置。 ### 数据划分 各数据集划分的样本量如下: | 数据集名称 | 训练集样本数 | 验证集样本数 | 测试集样本数 | |----------|------------:|------------:|------------:| | fin | 1014 | 303 | 150 | ### 引用信息 bibtex @inproceedings{salinas-alvarado-etal-2015-domain, title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment", author = "Salinas Alvarado, Julio Cesar and Verspoor, Karin and Baldwin, Timothy", booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015", month = dec, year = "2015", address = "Parramatta, Australia", url = "https://aclanthology.org/U15-1010", pages = "84--90", } 该论文标题可译为《面向信用风险评估的命名实体识别领域自适应》,会议名称可译为《2015年澳大利亚语言技术协会研讨会论文集》。
提供机构:
tner
原始信息汇总

数据集概述

数据集基本信息

  • 名称: FIN
  • 语言: 英语
  • 许可证: MIT
  • 多语言性: 单语种
  • 大小: 1K<n<10K
  • 任务类别: 令牌分类
  • 任务ID: 命名实体识别
  • 领域: 金融新闻
  • 实体数量: 4

数据集详细描述

数据集概要

FIN NER数据集是TNER项目的一部分,包含训练集(FIN5)和测试集(FIN3)。为了创建验证集,从训练集中随机抽取了一半的测试实例。

  • 实体类型: ORG, LOC, PER, MISC

数据集结构

数据实例

训练集的一个示例:

{ "tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"] }

标签ID

标签到ID的映射可在此处找到。 python { "O": 0, "B-PER": 1, "B-LOC": 2, "B-ORG": 3, "B-MISC": 4, "I-PER": 5, "I-LOC": 6, "I-ORG": 7, "I-MISC": 8 }

数据分割

名称 训练 验证 测试
fin 1014 303 150

引用信息

@inproceedings{salinas-alvarado-etal-2015-domain, title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment", author = "Salinas Alvarado, Julio Cesar and Verspoor, Karin and Baldwin, Timothy", booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015", month = dec, year = "2015", address = "Parramatta, Australia", url = "https://aclanthology.org/U15-1010", pages = "84--90", }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个金融领域的命名实体识别数据集,包含四种实体类型(ORG、LOC、PER、MISC),规模适中(1K-10K),适用于训练和评估NER模型。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作