five

fathyshalab/massive_news-de-DE

收藏
Hugging Face2023-03-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fathyshalab/massive_news-de-DE
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: id dtype: string - name: locale dtype: string - name: partition dtype: string - name: scenario dtype: class_label: names: '0': social '1': transport '2': calendar '3': play '4': news '5': datetime '6': recommendation '7': email '8': iot '9': general '10': audio '11': lists '12': qa '13': cooking '14': takeaway '15': music '16': alarm '17': weather - name: intent dtype: class_label: names: '0': datetime_query '1': iot_hue_lightchange '2': transport_ticket '3': takeaway_query '4': qa_stock '5': general_greet '6': recommendation_events '7': music_dislikeness '8': iot_wemo_off '9': cooking_recipe '10': qa_currency '11': transport_traffic '12': general_quirky '13': weather_query '14': audio_volume_up '15': email_addcontact '16': takeaway_order '17': email_querycontact '18': iot_hue_lightup '19': recommendation_locations '20': play_audiobook '21': lists_createoradd '22': news_query '23': alarm_query '24': iot_wemo_on '25': general_joke '26': qa_definition '27': social_query '28': music_settings '29': audio_volume_other '30': calendar_remove '31': iot_hue_lightdim '32': calendar_query '33': email_sendemail '34': iot_cleaning '35': audio_volume_down '36': play_radio '37': cooking_query '38': datetime_convert '39': qa_maths '40': iot_hue_lightoff '41': iot_hue_lighton '42': transport_query '43': music_likeness '44': email_query '45': play_music '46': audio_volume_mute '47': social_post '48': alarm_set '49': qa_factoid '50': calendar_set '51': play_game '52': alarm_remove '53': lists_remove '54': transport_taxi '55': recommendation_movies '56': iot_coffee '57': music_query '58': play_podcasts '59': lists_query - name: text dtype: string - name: annot_utt dtype: string - name: worker_id dtype: string - name: slot_method sequence: - name: slot dtype: string - name: method dtype: string - name: judgments sequence: - name: worker_id dtype: string - name: intent_score dtype: int8 - name: slots_score dtype: int8 - name: grammar_score dtype: int8 - name: spelling_score dtype: int8 - name: language_identification dtype: string - name: label_name dtype: string - name: label dtype: int64 splits: - name: train num_bytes: 147499 num_examples: 503 - name: validation num_bytes: 25026 num_examples: 82 - name: test num_bytes: 36859 num_examples: 124 download_size: 69773 dataset_size: 209384 --- # Dataset Card for "massive_news-de-DE" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

数据集信息(dataset_info): 特征字段(features): - 字段名:id,数据类型:字符串(string) - 字段名:locale,数据类型:字符串(string) - 字段名:partition,数据类型:字符串(string) - 字段名:scenario,数据类型:类别标签(class_label),类别名称映射如下: '0': 社交(social) '1': 交通(transport) '2': 日历(calendar) '3': 娱乐(play) '4': 新闻(news) '5': 时间(datetime) '6': 推荐(recommendation) '7': 邮件(email) '8': 物联网(iot) '9': 通用(general) '10': 音频(audio) '11': 列表(lists) '12': 问答(qa) '13': 烹饪(cooking) '14': 外卖(takeaway) '15': 音乐(music) '16': 闹钟(alarm) '17': 天气(weather) - 字段名:intent,数据类型:类别标签(class_label),类别名称映射如下: '0': 时间查询(datetime_query) '1': 飞利浦Hue灯光调节(iot_hue_lightchange) '2': 交通票务(transport_ticket) '3': 外卖咨询(takeaway_query) '4': 库存问答(qa_stock) '5': 通用问候(general_greet) '6': 活动推荐(recommendation_events) '7': 不喜欢音乐(music_dislikeness) '8': WeMo智能设备关闭(iot_wemo_off) '9': 烹饪食谱(cooking_recipe) '10': 货币问答(qa_currency) '11': 交通路况(transport_traffic) '12': 趣味通用对话(general_quirky) '13': 天气查询(weather_query) '14': 音频音量上调(audio_volume_up) '15': 添加联系人邮件(email_addcontact) '16': 外卖下单(takeaway_order) '17': 查询联系人邮件(email_querycontact) '18': 飞利浦Hue灯光点亮(iot_hue_lightup) '19': 地点推荐(recommendation_locations) '20': 播放有声书(play_audiobook) '21': 列表创建/添加(lists_createoradd) '22': 新闻查询(news_query) '23': 闹钟查询(alarm_query) '24': WeMo智能设备开启(iot_wemo_on) '25': 通用笑话(general_joke) '26': 定义问答(qa_definition) '27': 社交查询(social_query) '28': 音乐设置(music_settings) '29': 其他音频音量调整(audio_volume_other) '30': 日历删除(calendar_remove) '31': 飞利浦Hue灯光调暗(iot_hue_lightdim) '32': 日历查询(calendar_query) '33': 发送邮件(email_sendemail) '34': 物联网清洁设备控制(iot_cleaning) '35': 音频音量下调(audio_volume_down) '36': 播放电台(play_radio) '37': 烹饪咨询(cooking_query) '38': 时间转换(datetime_convert) '39': 数学问答(qa_maths) '40': 飞利浦Hue灯光关闭(iot_hue_lightoff) '41': 飞利浦Hue灯光开启(iot_hue_lighton) '42': 交通查询(transport_query) '43': 喜欢音乐(music_likeness) '44': 邮件查询(email_query) '45': 播放音乐(play_music) '46': 音频静音(audio_volume_mute) '47': 社交发帖(social_post) '48': 设置闹钟(alarm_set) '49': 事实性问答(qa_factoid) '50': 日历设置(calendar_set) '51': 玩游戏(play_game) '52': 取消闹钟(alarm_remove) '53': 列表删除(lists_remove) '54': 出租车服务(transport_taxi) '55': 电影推荐(recommendation_movies) '56': 物联网咖啡机控制(iot_coffee) '57': 音乐查询(music_query) '58': 播放播客(play_podcasts) '59': 列表查询(lists_query) - 字段名:text,数据类型:字符串(string) - 字段名:annot_utt,数据类型:字符串(string) - 字段名:worker_id,数据类型:字符串(string) - 字段名:slot_method,数据类型:序列(sequence),包含两个子字段: - 槽位(slot):数据类型为字符串(string) - 方法(method):数据类型为字符串(string) - 字段名:judgments,数据类型:序列(sequence),包含以下子字段: - 标注人员ID(worker_id):数据类型为字符串(string) - 意图评分(intent_score):数据类型为8位整数(int8) - 槽位评分(slots_score):数据类型为8位整数(int8) - 语法评分(grammar_score):数据类型为8位整数(int8) - 拼写评分(spelling_score):数据类型为8位整数(int8) - 语言标识(language_identification):数据类型为字符串(string) - 字段名:标签名称(label_name),数据类型为字符串(string) - 字段名:标签索引(label),数据类型为64位整数(int64) 数据集划分(splits): - 训练集(train):数据字节数为147499,样本量为503 - 验证集(validation):数据字节数为25026,样本量为82 - 测试集(test):数据字节数为36859,样本量为124 下载大小(download_size):69773字节,数据集总大小(dataset_size):209384字节 --- # 「massive_news-de-DE」数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
fathyshalab
原始信息汇总

数据集概述

数据集特征

  • id: 字符串类型
  • locale: 字符串类型
  • partition: 字符串类型
  • scenario: 分类标签,包括:social, transport, calendar, play, news, datetime, recommendation, email, iot, general, audio, lists, qa, cooking, takeaway, music, alarm, weather
  • intent: 分类标签,包括:datetime_query, iot_hue_lightchange, transport_ticket, takeaway_query, qa_stock, general_greet, recommendation_events, music_dislikeness, iot_wemo_off, cooking_recipe, qa_currency, transport_traffic, general_quirky, weather_query, audio_volume_up, email_addcontact, takeaway_order, email_querycontact, iot_hue_lightup, recommendation_locations, play_audiobook, lists_createoradd, news_query, alarm_query, iot_wemo_on, general_joke, qa_definition, social_query, music_settings, audio_volume_other, calendar_remove, iot_hue_lightdim, calendar_query, email_sendemail, iot_cleaning, audio_volume_down, play_radio, cooking_query, datetime_convert, qa_maths, iot_hue_lightoff, iot_hue_lighton, transport_query, music_likeness, email_query, play_music, audio_volume_mute, social_post, alarm_set, qa_factoid, calendar_set, play_game, alarm_remove, lists_remove, transport_taxi, recommendation_movies, iot_coffee, music_query, play_podcasts, lists_query
  • text: 字符串类型
  • annot_utt: 字符串类型
  • worker_id: 字符串类型
  • slot_method: 序列类型,包含:
    • slot: 字符串类型
    • method: 字符串类型
  • judgments: 序列类型,包含:
    • worker_id: 字符串类型
    • intent_score: 8位整数类型
    • slots_score: 8位整数类型
    • grammar_score: 8位整数类型
    • spelling_score: 8位整数类型
    • language_identification: 字符串类型
  • label_name: 字符串类型
  • label: 64位整数类型

数据集分割

  • train: 503个样本,占用147499字节
  • validation: 82个样本,占用25026字节
  • test: 124个样本,占用36859字节

数据集大小

  • 下载大小: 69773字节
  • 数据集总大小: 209384字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作