community-datasets/euronews

Name: community-datasets/euronews
Creator: community-datasets
Published: 2024-06-24 11:36:46
License: 暂无描述

Hugging Face2024-06-24 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/community-datasets/euronews

下载链接

链接失效反馈

官方服务：

资源简介：

Europeana Newspapers数据集是一个多语言命名实体识别（NER）数据集，包含德语、法语和荷兰语的文本数据。数据集由专家生成注释，语言由众包生成。数据集支持的任务是命名实体识别，数据字段包括id、tokens和ner_tags，其中ner_tags用于标记命名实体。数据集的结构包括多个配置，每个配置对应不同的语言和数据集来源，每个配置包含训练集的数据实例、数据字段和数据分割信息。

The Europeana Newspapers dataset is a multilingual named entity recognition (NER) dataset containing text data in German, French, and Dutch. The annotations are expert-generated, and the language is crowdsourced. The dataset supports the task of named entity recognition, with data fields including id, tokens, and ner_tags, where ner_tags are used to label named entities. The dataset structure includes multiple configurations, each corresponding to different languages and data sources, with each configuration containing data instances, data fields, and data splits for the training set.

提供机构：

community-datasets

原始信息汇总

数据集概述

基本信息

数据集名称: Europeana Newspapers
语言: 德语 (de), 法语 (fr), 荷兰语 (nl)
许可证: CC0-1.0
多语言性: 多语种
数据集大小分类: n<1K
源数据: 原始数据
任务类别: 词性标注
任务ID: 命名实体识别

数据集配置

de-lft

特征:
- id: 字符串
- tokens: 字符串序列
- ner_tags: 序列，包含以下类别标签:
  - 0: O
  - 1: B-PER
  - 2: I-PER
  - 3: B-ORG
  - 4: I-ORG
  - 5: B-LOC
  - 6: I-LOC
分割:
- train: 1个样本，1263426字节
下载大小: 394615字节
数据集大小: 1263426字节

de-onb

特征:
- id: 字符串
- tokens: 字符串序列
- ner_tags: 序列，包含以下类别标签:
  - 0: O
  - 1: B-PER
  - 2: I-PER
  - 3: B-ORG
  - 4: I-ORG
  - 5: B-LOC
  - 6: I-LOC
分割:
- train: 1个样本，502353字节
下载大小: 165235字节
数据集大小: 502353字节

de-sbb

特征:
- id: 字符串
- tokens: 字符串序列
- ner_tags: 序列，包含以下类别标签:
  - 0: O
  - 1: B-PER
  - 2: I-PER
  - 3: B-ORG
  - 4: I-ORG
  - 5: B-LOC
  - 6: I-LOC
分割:
- train: 1个样本，817279字节
下载大小: 200613字节
数据集大小: 817279字节

fr-bnf

特征:
- id: 字符串
- tokens: 字符串序列
- ner_tags: 序列，包含以下类别标签:
  - 0: O
  - 1: B-PER
  - 2: I-PER
  - 3: B-ORG
  - 4: I-ORG
  - 5: B-LOC
  - 6: I-LOC
分割:
- train: 1个样本，3340283字节
下载大小: 687579字节
数据集大小: 3340283字节

nl-kb

特征:
- id: 字符串
- tokens: 字符串序列
- ner_tags: 序列，包含以下类别标签:
  - 0: O
  - 1: B-PER
  - 2: I-PER
  - 3: B-ORG
  - 4: I-ORG
  - 5: B-LOC
  - 6: I-LOC
分割:
- train: 1个样本，3104197字节
下载大小: 695197字节
数据集大小: 3104197字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集