terhdavid/ner-company-dataset

Name: terhdavid/ner-company-dataset
Creator: terhdavid
Published: 2023-09-01 08:01:01
License: 暂无描述

Hugging Face2023-09-01 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/terhdavid/ner-company-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - split: validation path: data/validation-* dataset_info: features: - name: tokens sequence: string - name: ner sequence: class_label: names: '0': O '1': B-ORG '2': I-ORG splits: - name: train num_bytes: 111394.63994565218 num_examples: 515 - name: test num_bytes: 47802.360054347824 num_examples: 221 - name: validation num_bytes: 47802.360054347824 num_examples: 221 download_size: 41876 dataset_size: 206999.36005434784 --- # Dataset Card for "ner-company-dataset" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

This dataset is designed for Named Entity Recognition (NER) tasks, focusing on the identification of company entities. It includes train, test, and validation splits, each with varying numbers of examples and bytes. The dataset features include tokens and ner, where tokens are string sequences and ner is a named entity recognition label with three categories: O (non-entity), B-ORG (beginning of organization entity), and I-ORG (inside of organization entity). The total download size of the dataset is 41876 bytes, and the total size is 206999.36005434784 bytes.

提供机构：

terhdavid

原始信息汇总

数据集概述

配置

默认配置：
- 训练集：路径为 data/train-*
- 测试集：路径为 data/test-*
- 验证集：路径为 data/validation-*

数据特征

tokens：字符串序列
ner：命名实体识别序列，包含以下类别标签：
- 0: O
- 1: B-ORG
- 2: I-ORG

数据集划分

训练集：
- 字节数：111394.63994565218
- 样本数：515
测试集：
- 字节数：47802.360054347824
- 样本数：221
验证集：
- 字节数：47802.360054347824
- 样本数：221

数据集大小

下载大小：41876 字节
数据集大小：206999.36005434784 字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集