five

Financial-NER-NLP

收藏
魔搭社区2025-12-05 更新2025-10-11 收录
下载链接:
https://modelscope.cn/datasets/Josephgflowers/Financial-NER-NLP
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Card for Financial-NER-NLP Dataset Summary The Financial-NER-NLP Dataset is a derivative of the FiNER-139 dataset, which consists of 1.1 million sentences annotated with 139 XBRL tags. This new dataset transforms the original structured data into natural language prompts suitable for training language models. The dataset is designed to enhance models’ abilities in tasks such as named entity recognition (NER), summarization, and information extraction in the financial domain. The Financial-NER-NLP Dataset retains the financial domain’s specificity, focusing on numeric tokens and context-based tagging, while providing a more accessible and intuitive format for training natural language processing models. Key Features: Natural Language Formatting: The dataset converts the original structured annotations into conversational prompts, making it suitable for training in a question-answering or dialogue-based format. Retains Financial Domain Focus: Preserves the emphasis on context-based tagging of numeric tokens, critical for financial document processing. Supported Tasks This dataset supports: Named Entity Recognition (NER): Identifying and classifying entities within financial text. Data Augmentation: Providing a rich source of natural language data for augmenting existing financial NLP datasets. Languages This dataset is compiled from English-language financial reports and maintains the language characteristics of the original dataset. Acknowledgments This dataset is based on the FiNER-139 dataset, created and released by nlpaueb. We would like to express our gratitude to the original creators for their valuable work, which has enabled the development of this derivative dataset. Citation If you use this dataset, please cite the original FiNER-139 dataset as follows: bibtex @inproceedings{nlpaueb_finer139_2023, title={FiNER-139: Financial Named Entity Recognition with Extensive Business Reporting Language Tags}, author={NLPAUEB}, year={2023}, publisher={Hugging Face}, url={https://huggingface.co/datasets/nlpaueb/finer-139} }

Financial-NER-NLP 数据集卡片 ## 数据集概述 Financial-NER-NLP 数据集是 FiNER-139 数据集的衍生版本,后者包含110万条标注有139个可扩展商业报告语言(XBRL)标签的语句。本数据集将原始结构化数据转换为适用于大语言模型(Large Language Model)训练的自然语言提示词,旨在提升模型在金融领域的命名实体识别(Named Entity Recognition, NER)、文本摘要与信息抽取等任务中的性能。 Financial-NER-NLP 数据集保留了金融领域的专属特性,聚焦于数值标记(Token)与基于上下文的标注,同时为自然语言处理(Natural Language Processing, NLP)模型的训练提供了更易用且直观的格式。 ## 核心特性 - 自然语言格式化:将原始结构化标注转换为对话式提示词,适配问答式或基于对话的训练场景。 - 保留金融领域聚焦性:保留对数值标记(Token)的基于上下文标注的核心关注,这对金融文档处理至关重要。 ## 支持任务 本数据集支持以下任务: - 命名实体识别(NER):在金融文本中识别并分类实体。 - 数据增强:提供丰富的自然语言数据资源,用于扩充现有金融自然语言处理(NLP)数据集。 ## 语言说明 本数据集源自英文金融报告,保留了原始数据集的语言特性。 ## 致谢 本数据集基于 nlpaueb 团队创建并发布的 FiNER-139 数据集。我们谨向原始创作者致以诚挚谢意,其富有价值的工作为本衍生数据集的开发提供了坚实基础。 ## 引用 若您使用本数据集,请按如下格式引用原始 FiNER-139 数据集: bibtex @inproceedings{nlpaueb_finer139_2023, title={FiNER-139: Financial Named Entity Recognition with Extensive Business Reporting Language Tags}, author={NLPAUEB}, year={2023}, publisher={Hugging Face}, url={https://huggingface.co/datasets/nlpaueb/finer-139} }
提供机构:
maas
创建时间:
2025-08-31
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作