tner/fin

Name: tner/fin
Creator: tner
Published: 2022-08-15 17:50:31
License: 暂无描述

Hugging Face2022-08-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/tner/fin

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: - mit multilinguality: - monolingual size_categories: - 1K<n<10K task_categories: - token-classification task_ids: - named-entity-recognition pretty_name: FIN --- # Dataset Card for "tner/fin" ## Dataset Description - **Repository:** [T-NER](https://github.com/asahi417/tner) - **Paper:** [https://aclanthology.org/U15-1010.pdf](https://aclanthology.org/U15-1010.pdf) - **Dataset:** FIN - **Domain:** Financial News - **Number of Entity:** 4 ### Dataset Summary FIN NER dataset formatted in a part of [TNER](https://github.com/asahi417/tner) project. FIN dataset contains training (FIN5) and test (FIN3) only, so we randomly sample a half size of test instances from the training set to create validation set. - Entity Types: `ORG`, `LOC`, `PER`, `MISC` ## Dataset Structure ### Data Instances An example of `train` looks as follows. ``` { "tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"] } ``` ### Label ID The label2id dictionary can be found at [here](https://huggingface.co/datasets/tner/fin/raw/main/dataset/label.json). ```python { "O": 0, "B-PER": 1, "B-LOC": 2, "B-ORG": 3, "B-MISC": 4, "I-PER": 5, "I-LOC": 6, "I-ORG": 7, "I-MISC": 8 } ``` ### Data Splits | name |train|validation|test| |---------|----:|---------:|---:| |fin |1014 | 303| 150| ### Citation Information ``` @inproceedings{salinas-alvarado-etal-2015-domain, title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment", author = "Salinas Alvarado, Julio Cesar and Verspoor, Karin and Baldwin, Timothy", booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015", month = dec, year = "2015", address = "Parramatta, Australia", url = "https://aclanthology.org/U15-1010", pages = "84--90", } ```

--- 语言： - 英语许可证： - MIT协议多语言属性： - 单语言样本量范围： - 1000 < 样本数 < 10000 任务类别： - Token分类（Token Classification）任务子类型： - 命名实体识别（Named Entity Recognition，NER）简称：FIN --- # "tner/fin" 数据集卡片 ## 数据集说明 - **代码仓库**：[T-NER](https://github.com/asahi417/tner) - **相关论文**：[https://aclanthology.org/U15-1010.pdf](https://aclanthology.org/U15-1010.pdf) - **数据集名称**：FIN - **应用领域**：金融新闻 - **实体类别数**：4 ### 数据集概览 FIN命名实体识别（NER）数据集是[T-NER](https://github.com/asahi417/tner)项目的一部分，采用该项目的标准格式整理。原始FIN数据集仅包含训练集（FIN5）与测试集（FIN3），因此我们从训练集中随机采样半数样本作为验证集。 - **实体类型**：组织（Organization，ORG）、地点（Location，LOC）、人物（Person，PER）、其他实体（Miscellaneous，MISC） ## 数据集结构 ### 数据样例训练集的一条样例格式如下： json { "tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"] } ### 标签映射标签与ID的对应字典可参见[此处](https://huggingface.co/datasets/tner/fin/raw/main/dataset/label.json)： python { "O": 0, "B-PER": 1, "B-LOC": 2, "B-ORG": 3, "B-MISC": 4, "I-PER": 5, "I-LOC": 6, "I-ORG": 7, "I-MISC": 8 } 其中，`O`表示非实体标记，`B-*`表示对应实体的起始位置，`I-*`表示对应实体的内部位置。 ### 数据划分各数据集划分的样本量如下： | 数据集名称 | 训练集样本数 | 验证集样本数 | 测试集样本数 | |----------|------------:|------------:|------------:| | fin | 1014 | 303 | 150 | ### 引用信息 bibtex @inproceedings{salinas-alvarado-etal-2015-domain, title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment", author = "Salinas Alvarado, Julio Cesar and Verspoor, Karin and Baldwin, Timothy", booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015", month = dec, year = "2015", address = "Parramatta, Australia", url = "https://aclanthology.org/U15-1010", pages = "84--90", } 该论文标题可译为《面向信用风险评估的命名实体识别领域自适应》，会议名称可译为《2015年澳大利亚语言技术协会研讨会论文集》。

提供机构：

tner

原始信息汇总

数据集概述

数据集基本信息

名称: FIN
语言: 英语
许可证: MIT
多语言性: 单语种
大小: 1K<n<10K
任务类别: 令牌分类
任务ID: 命名实体识别
领域: 金融新闻
实体数量: 4

数据集详细描述

数据集概要

FIN NER数据集是TNER项目的一部分，包含训练集(FIN5)和测试集(FIN3)。为了创建验证集，从训练集中随机抽取了一半的测试实例。

实体类型: ORG, LOC, PER, MISC

数据集结构

数据实例

训练集的一个示例：

{ "tags": [0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], "tokens": ["1", ".", "1", ".", "4", "Borrower", "engages", "in", "criminal", "conduct", "or", "is", "involved", "in", "criminal", "activities", ";"] }

标签ID

标签到ID的映射可在此处找到。 python { "O": 0, "B-PER": 1, "B-LOC": 2, "B-ORG": 3, "B-MISC": 4, "I-PER": 5, "I-LOC": 6, "I-ORG": 7, "I-MISC": 8 }

数据分割

名称	训练	验证	测试
fin	1014	303	150

引用信息

@inproceedings{salinas-alvarado-etal-2015-domain, title = "Domain Adaption of Named Entity Recognition to Support Credit Risk Assessment", author = "Salinas Alvarado, Julio Cesar and Verspoor, Karin and Baldwin, Timothy", booktitle = "Proceedings of the Australasian Language Technology Association Workshop 2015", month = dec, year = "2015", address = "Parramatta, Australia", url = "https://aclanthology.org/U15-1010", pages = "84--90", }

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个金融领域的命名实体识别数据集，包含四种实体类型（ORG、LOC、PER、MISC），规模适中（1K-10K），适用于训练和评估NER模型。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集