terhdavid/ner-company-dataset
收藏Hugging Face2023-09-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/terhdavid/ner-company-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
- split: validation
path: data/validation-*
dataset_info:
features:
- name: tokens
sequence: string
- name: ner
sequence:
class_label:
names:
'0': O
'1': B-ORG
'2': I-ORG
splits:
- name: train
num_bytes: 111394.63994565218
num_examples: 515
- name: test
num_bytes: 47802.360054347824
num_examples: 221
- name: validation
num_bytes: 47802.360054347824
num_examples: 221
download_size: 41876
dataset_size: 206999.36005434784
---
# Dataset Card for "ner-company-dataset"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
This dataset is designed for Named Entity Recognition (NER) tasks, focusing on the identification of company entities. It includes train, test, and validation splits, each with varying numbers of examples and bytes. The dataset features include tokens and ner, where tokens are string sequences and ner is a named entity recognition label with three categories: O (non-entity), B-ORG (beginning of organization entity), and I-ORG (inside of organization entity). The total download size of the dataset is 41876 bytes, and the total size is 206999.36005434784 bytes.
提供机构:
terhdavid
原始信息汇总
数据集概述
配置
- 默认配置:
- 训练集:路径为
data/train-* - 测试集:路径为
data/test-* - 验证集:路径为
data/validation-*
- 训练集:路径为
数据特征
- tokens:字符串序列
- ner:命名实体识别序列,包含以下类别标签:
- 0: O
- 1: B-ORG
- 2: I-ORG
数据集划分
- 训练集:
- 字节数:111394.63994565218
- 样本数:515
- 测试集:
- 字节数:47802.360054347824
- 样本数:221
- 验证集:
- 字节数:47802.360054347824
- 样本数:221
数据集大小
- 下载大小:41876 字节
- 数据集大小:206999.36005434784 字节



