five

stefan-it/co-funer

收藏
Hugging Face2024-03-25 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/stefan-it/co-funer
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - token-classification language: - de --- # CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction This inofficial dataset repository provides a CoNLL-like version of the CO-Fun **NER** dataset, that was proposed in the CO-Fun paper (https://arxiv.org/abs/2403.15322): > The process of cyber mapping gives insights in relationships among financial entities and service providers. Centered around the outsourcing practices of companies within fund prospectuses in Germany, we introduce a dataset specifically designed for named entity recognition and relation extraction tasks. The labeling process on 948 sentences was carried out by three experts which yields to 5,969 annotations for four entity types (Outsourcing, Company, Location and Software) and 4,102 relation annotations (Outsourcing-Company, Company-Location). State-of-the-art deep learning models were trained to recognize entities and extract relations showing first promising results. ## Preprocessing The notebook [Export-To-CoNLL.ipynb](Export-To-CoNLL.ipynb) performs the necessary steps to create a CoNLL-like version of the CO-Fun dataset, that could easily be used for fine-tuning NER models. Additionally, the [FlairDatasetTest.ipynb](FlairDatasetTest.ipynb) notebooks loads the dataset with the Flair dataset loader and checks, if the number of parsed sentences is correct and identical to the number of sentences reported in the official CO-Fun paper. ## Named Entites The CO-Fun dataset provides annotations for the following Named Entities: * `Auslagerung` (engl. outsourcing) * `Unternehmen` (engl. company) * `Ort` (engl. location) * `Software` # Example: Load Dataset with Flair library The notebooks [FlairDatasetExample.ipynb](FlairDatasetExample.ipynb) shows how to load the dataset with the awesome [Flair library](https://github.com/flairNLP/flair). # Changelog * 25.03.2024: Initial version of the preprocessed CO-Fun NER dataset is released. # Licence The original CO-Fun dataset is released under MIT license. Thus, this preprocessed version is also licenced under MIT.
提供机构:
stefan-it
原始信息汇总

CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction

Overview

  • License: MIT
  • Task Categories: Token-classification
  • Language: German (de)

Dataset Description

  • Purpose: Designed for named entity recognition (NER) and relation extraction tasks.
  • Content: Annotated 948 sentences with 5,969 annotations for four entity types (Outsourcing, Company, Location, and Software) and 4,102 relation annotations (Outsourcing-Company, Company-Location).
  • Expertise: Labeling process conducted by three experts.
  • Results: Trained state-of-the-art deep learning models showing promising results in entity recognition and relation extraction.

Named Entities

  • Auslagerung (Outsourcing)
  • Unternehmen (Company)
  • Ort (Location)
  • Software

Preprocessing

Usage Example

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作