five

qmeeus/MSNER

收藏
Hugging Face2024-03-28 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/qmeeus/MSNER
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: audio_id dtype: string - name: language dtype: class_label: names: '0': en '1': de '2': fr '3': es '4': pl '5': it '6': ro '7': hu '8': cs '9': nl '10': fi '11': hr '12': sk '13': sl '14': et '15': lt '16': en_accented - name: audio dtype: audio: sampling_rate: 16000 - name: sentence dtype: string - name: unified_entities sequence: class_label: names: '0': B-cell_line '1': B-character_name '2': B-language '3': B-disease '4': B-event '5': B-organization '6': B-character '7': B-origin '8': B-other '9': B-dish '10': B-relationship '11': B-artifact '12': B-work_of_art '13': B-facility '14': B-product '15': B-amenity '16': B-rating '17': B-actor '18': B-date '19': B-ratings_average '20': B-quantity '21': B-dna '22': B-quote '23': B-title '24': B-song '25': B-genre '26': B-cuisine '27': B-soundtrack '28': B-ordinal_number '29': B-protein '30': B-collection '31': B-money '32': B-person '33': B-project '34': B-group '35': B-review '36': B-percent '37': B-law '38': B-director '39': B-award '40': B-chemical '41': B-geopolitical_area '42': B-rna '43': B-restaurant '44': B-location '45': B-opinion '46': B-cell_type '47': B-trailer '48': B-cardinal_number '49': B-plot '50': B-corporation '51': B-time '52': I-cell_line '53': I-character_name '54': I-language '55': I-disease '56': I-event '57': I-organization '58': I-character '59': I-origin '60': I-other '61': I-dish '62': I-relationship '63': I-artifact '64': I-work_of_art '65': I-facility '66': I-product '67': I-amenity '68': I-rating '69': I-actor '70': I-date '71': I-ratings_average '72': I-quantity '73': I-dna '74': I-quote '75': I-title '76': I-song '77': I-genre '78': I-cuisine '79': I-soundtrack '80': I-ordinal_number '81': I-protein '82': I-collection '83': I-money '84': I-person '85': I-project '86': I-group '87': I-review '88': I-percent '89': I-law '90': I-director '91': I-award '92': I-chemical '93': I-geopolitical_area '94': I-rna '95': I-restaurant '96': I-location '97': I-opinion '98': I-cell_type '99': I-trailer '100': I-cardinal_number '101': I-plot '102': I-corporation '103': I-time '104': O - name: raw_entities sequence: class_label: names: '0': O '1': B-cardinal number '2': B-date '3': I-date '4': B-person '5': I-person '6': B-group '7': B-geopolitical area '8': I-geopolitical area '9': B-law '10': I-law '11': B-organization '12': I-organization '13': B-percent '14': I-percent '15': B-ordinal number '16': B-money '17': I-money '18': B-work of art '19': I-work of art '20': B-facility '21': B-time '22': I-cardinal number '23': B-location '24': B-quantity '25': I-quantity '26': I-group '27': I-location '28': B-product '29': I-time '30': B-event '31': I-event '32': I-facility '33': B-language '34': I-product '35': I-ordinal number '36': I-language splits: - name: de num_bytes: 1131409390.498 num_examples: 1966 - name: es num_bytes: 1141307288.576 num_examples: 1512 - name: fr num_bytes: 1058324947.032 num_examples: 1656 - name: nl num_bytes: 602932929.76 num_examples: 1120 download_size: 3332139599 dataset_size: 3933974555.866 configs: - config_name: default data_files: - split: de path: data/de-* - split: es path: data/es-* - split: fr path: data/fr-* - split: nl path: data/nl-* language: - de - es - fr - nl tags: - spoken-ner - multilingual - MSNER - spoken-language-understanding pretty_name: MSNER --- # Dataset Card for "spoken-ner" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
qmeeus
原始信息汇总

数据集概述

数据集信息

  • 特征列表:
    • audio_id: 字符串类型
    • language: 分类标签类型,包含以下语言:
      • en, de, fr, es, pl, it, ro, hu, cs, nl, fi, hr, sk, sl, et, lt, en_accented
    • audio: 音频类型,采样率为16000
    • sentence: 字符串类型
    • unified_entities: 序列分类标签类型,包含以下实体标签:
      • B-cell_line, B-character_name, B-language, B-disease, B-event, B-organization, B-character, B-origin, B-other, B-dish, B-relationship, B-artifact, B-work_of_art, B-facility, B-product, B-amenity, B-rating, B-actor, B-date, B-ratings_average, B-quantity, B-dna, B-quote, B-title, B-song, B-genre, B-cuisine, B-soundtrack, B-ordinal_number, B-protein, B-collection, B-money, B-person, B-project, B-group, B-review, B-percent, B-law, B-director, B-award, B-chemical, B-geopolitical_area, B-rna, B-restaurant, B-location, B-opinion, B-cell_type, B-trailer, B-cardinal_number, B-plot, B-corporation, B-time, I-cell_line, I-character_name, I-language, I-disease, I-event, I-organization, I-character, I-origin, I-other, I-dish, I-relationship, I-artifact, I-work_of_art, I-facility, I-product, I-amenity, I-rating, I-actor, I-date, I-ratings_average, I-quantity, I-dna, I-quote, I-title, I-song, I-genre, I-cuisine, I-soundtrack, I-ordinal_number, I-protein, I-collection, I-money, I-person, I-project, I-group, I-review, I-percent, I-law, I-director, I-award, I-chemical, I-geopolitical_area, I-rna, I-restaurant, I-location, I-opinion, I-cell_type, I-trailer, I-cardinal_number, I-plot, I-corporation, I-time, O
    • raw_entities: 序列分类标签类型,包含以下实体标签:
      • O, B-cardinal number, B-date, I-date, B-person, I-person, B-group, B-geopolitical area, I-geopolitical area, B-law, I-law, B-organization, I-organization, B-percent, I-percent, B-ordinal number, B-money, I-money, B-work of art, I-work of art, B-facility, B-time, I-cardinal number, B-location, B-quantity, I-quantity, I-group, I-location, B-product, I-time, B-event, I-event, I-facility, B-language, I-product, I-ordinal number, I-language

数据集分割

  • de:
    • 字节数: 1131409390.498
    • 样本数: 1966
  • es:
    • 字节数: 1141307288.576
    • 样本数: 1512
  • fr:
    • 字节数: 1058324947.032
    • 样本数: 1656
  • nl:
    • 字节数: 602932929.76
    • 样本数: 1120

数据集大小

  • 下载大小: 3332139599
  • 数据集大小: 3933974555.866

配置

  • 配置名称: default
    • 数据文件:
      • de: data/de-*
      • es: data/es-*
      • fr: data/fr-*
      • nl: data/nl-*

语言

  • de, es, fr, nl

标签

  • spoken-ner, multilingual, MSNER, spoken-language-understanding

数据集名称

  • MSNER
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作