five

terhdavid/ner-company-dataset

收藏
Hugging Face2023-09-01 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/terhdavid/ner-company-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - split: validation path: data/validation-* dataset_info: features: - name: tokens sequence: string - name: ner sequence: class_label: names: '0': O '1': B-ORG '2': I-ORG splits: - name: train num_bytes: 111394.63994565218 num_examples: 515 - name: test num_bytes: 47802.360054347824 num_examples: 221 - name: validation num_bytes: 47802.360054347824 num_examples: 221 download_size: 41876 dataset_size: 206999.36005434784 --- # Dataset Card for "ner-company-dataset" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

This dataset is designed for Named Entity Recognition (NER) tasks, focusing on the identification of company entities. It includes train, test, and validation splits, each with varying numbers of examples and bytes. The dataset features include tokens and ner, where tokens are string sequences and ner is a named entity recognition label with three categories: O (non-entity), B-ORG (beginning of organization entity), and I-ORG (inside of organization entity). The total download size of the dataset is 41876 bytes, and the total size is 206999.36005434784 bytes.
提供机构:
terhdavid
原始信息汇总

数据集概述

配置

  • 默认配置
    • 训练集:路径为 data/train-*
    • 测试集:路径为 data/test-*
    • 验证集:路径为 data/validation-*

数据特征

  • tokens:字符串序列
  • ner:命名实体识别序列,包含以下类别标签:
    • 0: O
    • 1: B-ORG
    • 2: I-ORG

数据集划分

  • 训练集
    • 字节数:111394.63994565218
    • 样本数:515
  • 测试集
    • 字节数:47802.360054347824
    • 样本数:221
  • 验证集
    • 字节数:47802.360054347824
    • 样本数:221

数据集大小

  • 下载大小:41876 字节
  • 数据集大小:206999.36005434784 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作