fathyshalab/massive_news-de-DE
收藏Hugging Face2023-03-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/fathyshalab/massive_news-de-DE
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: locale
dtype: string
- name: partition
dtype: string
- name: scenario
dtype:
class_label:
names:
'0': social
'1': transport
'2': calendar
'3': play
'4': news
'5': datetime
'6': recommendation
'7': email
'8': iot
'9': general
'10': audio
'11': lists
'12': qa
'13': cooking
'14': takeaway
'15': music
'16': alarm
'17': weather
- name: intent
dtype:
class_label:
names:
'0': datetime_query
'1': iot_hue_lightchange
'2': transport_ticket
'3': takeaway_query
'4': qa_stock
'5': general_greet
'6': recommendation_events
'7': music_dislikeness
'8': iot_wemo_off
'9': cooking_recipe
'10': qa_currency
'11': transport_traffic
'12': general_quirky
'13': weather_query
'14': audio_volume_up
'15': email_addcontact
'16': takeaway_order
'17': email_querycontact
'18': iot_hue_lightup
'19': recommendation_locations
'20': play_audiobook
'21': lists_createoradd
'22': news_query
'23': alarm_query
'24': iot_wemo_on
'25': general_joke
'26': qa_definition
'27': social_query
'28': music_settings
'29': audio_volume_other
'30': calendar_remove
'31': iot_hue_lightdim
'32': calendar_query
'33': email_sendemail
'34': iot_cleaning
'35': audio_volume_down
'36': play_radio
'37': cooking_query
'38': datetime_convert
'39': qa_maths
'40': iot_hue_lightoff
'41': iot_hue_lighton
'42': transport_query
'43': music_likeness
'44': email_query
'45': play_music
'46': audio_volume_mute
'47': social_post
'48': alarm_set
'49': qa_factoid
'50': calendar_set
'51': play_game
'52': alarm_remove
'53': lists_remove
'54': transport_taxi
'55': recommendation_movies
'56': iot_coffee
'57': music_query
'58': play_podcasts
'59': lists_query
- name: text
dtype: string
- name: annot_utt
dtype: string
- name: worker_id
dtype: string
- name: slot_method
sequence:
- name: slot
dtype: string
- name: method
dtype: string
- name: judgments
sequence:
- name: worker_id
dtype: string
- name: intent_score
dtype: int8
- name: slots_score
dtype: int8
- name: grammar_score
dtype: int8
- name: spelling_score
dtype: int8
- name: language_identification
dtype: string
- name: label_name
dtype: string
- name: label
dtype: int64
splits:
- name: train
num_bytes: 147499
num_examples: 503
- name: validation
num_bytes: 25026
num_examples: 82
- name: test
num_bytes: 36859
num_examples: 124
download_size: 69773
dataset_size: 209384
---
# Dataset Card for "massive_news-de-DE"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
数据集信息(dataset_info):
特征字段(features):
- 字段名:id,数据类型:字符串(string)
- 字段名:locale,数据类型:字符串(string)
- 字段名:partition,数据类型:字符串(string)
- 字段名:scenario,数据类型:类别标签(class_label),类别名称映射如下:
'0': 社交(social)
'1': 交通(transport)
'2': 日历(calendar)
'3': 娱乐(play)
'4': 新闻(news)
'5': 时间(datetime)
'6': 推荐(recommendation)
'7': 邮件(email)
'8': 物联网(iot)
'9': 通用(general)
'10': 音频(audio)
'11': 列表(lists)
'12': 问答(qa)
'13': 烹饪(cooking)
'14': 外卖(takeaway)
'15': 音乐(music)
'16': 闹钟(alarm)
'17': 天气(weather)
- 字段名:intent,数据类型:类别标签(class_label),类别名称映射如下:
'0': 时间查询(datetime_query)
'1': 飞利浦Hue灯光调节(iot_hue_lightchange)
'2': 交通票务(transport_ticket)
'3': 外卖咨询(takeaway_query)
'4': 库存问答(qa_stock)
'5': 通用问候(general_greet)
'6': 活动推荐(recommendation_events)
'7': 不喜欢音乐(music_dislikeness)
'8': WeMo智能设备关闭(iot_wemo_off)
'9': 烹饪食谱(cooking_recipe)
'10': 货币问答(qa_currency)
'11': 交通路况(transport_traffic)
'12': 趣味通用对话(general_quirky)
'13': 天气查询(weather_query)
'14': 音频音量上调(audio_volume_up)
'15': 添加联系人邮件(email_addcontact)
'16': 外卖下单(takeaway_order)
'17': 查询联系人邮件(email_querycontact)
'18': 飞利浦Hue灯光点亮(iot_hue_lightup)
'19': 地点推荐(recommendation_locations)
'20': 播放有声书(play_audiobook)
'21': 列表创建/添加(lists_createoradd)
'22': 新闻查询(news_query)
'23': 闹钟查询(alarm_query)
'24': WeMo智能设备开启(iot_wemo_on)
'25': 通用笑话(general_joke)
'26': 定义问答(qa_definition)
'27': 社交查询(social_query)
'28': 音乐设置(music_settings)
'29': 其他音频音量调整(audio_volume_other)
'30': 日历删除(calendar_remove)
'31': 飞利浦Hue灯光调暗(iot_hue_lightdim)
'32': 日历查询(calendar_query)
'33': 发送邮件(email_sendemail)
'34': 物联网清洁设备控制(iot_cleaning)
'35': 音频音量下调(audio_volume_down)
'36': 播放电台(play_radio)
'37': 烹饪咨询(cooking_query)
'38': 时间转换(datetime_convert)
'39': 数学问答(qa_maths)
'40': 飞利浦Hue灯光关闭(iot_hue_lightoff)
'41': 飞利浦Hue灯光开启(iot_hue_lighton)
'42': 交通查询(transport_query)
'43': 喜欢音乐(music_likeness)
'44': 邮件查询(email_query)
'45': 播放音乐(play_music)
'46': 音频静音(audio_volume_mute)
'47': 社交发帖(social_post)
'48': 设置闹钟(alarm_set)
'49': 事实性问答(qa_factoid)
'50': 日历设置(calendar_set)
'51': 玩游戏(play_game)
'52': 取消闹钟(alarm_remove)
'53': 列表删除(lists_remove)
'54': 出租车服务(transport_taxi)
'55': 电影推荐(recommendation_movies)
'56': 物联网咖啡机控制(iot_coffee)
'57': 音乐查询(music_query)
'58': 播放播客(play_podcasts)
'59': 列表查询(lists_query)
- 字段名:text,数据类型:字符串(string)
- 字段名:annot_utt,数据类型:字符串(string)
- 字段名:worker_id,数据类型:字符串(string)
- 字段名:slot_method,数据类型:序列(sequence),包含两个子字段:
- 槽位(slot):数据类型为字符串(string)
- 方法(method):数据类型为字符串(string)
- 字段名:judgments,数据类型:序列(sequence),包含以下子字段:
- 标注人员ID(worker_id):数据类型为字符串(string)
- 意图评分(intent_score):数据类型为8位整数(int8)
- 槽位评分(slots_score):数据类型为8位整数(int8)
- 语法评分(grammar_score):数据类型为8位整数(int8)
- 拼写评分(spelling_score):数据类型为8位整数(int8)
- 语言标识(language_identification):数据类型为字符串(string)
- 字段名:标签名称(label_name),数据类型为字符串(string)
- 字段名:标签索引(label),数据类型为64位整数(int64)
数据集划分(splits):
- 训练集(train):数据字节数为147499,样本量为503
- 验证集(validation):数据字节数为25026,样本量为82
- 测试集(test):数据字节数为36859,样本量为124
下载大小(download_size):69773字节,数据集总大小(dataset_size):209384字节
---
# 「massive_news-de-DE」数据集卡片
[需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
fathyshalab
原始信息汇总
数据集概述
数据集特征
- id: 字符串类型
- locale: 字符串类型
- partition: 字符串类型
- scenario: 分类标签,包括:social, transport, calendar, play, news, datetime, recommendation, email, iot, general, audio, lists, qa, cooking, takeaway, music, alarm, weather
- intent: 分类标签,包括:datetime_query, iot_hue_lightchange, transport_ticket, takeaway_query, qa_stock, general_greet, recommendation_events, music_dislikeness, iot_wemo_off, cooking_recipe, qa_currency, transport_traffic, general_quirky, weather_query, audio_volume_up, email_addcontact, takeaway_order, email_querycontact, iot_hue_lightup, recommendation_locations, play_audiobook, lists_createoradd, news_query, alarm_query, iot_wemo_on, general_joke, qa_definition, social_query, music_settings, audio_volume_other, calendar_remove, iot_hue_lightdim, calendar_query, email_sendemail, iot_cleaning, audio_volume_down, play_radio, cooking_query, datetime_convert, qa_maths, iot_hue_lightoff, iot_hue_lighton, transport_query, music_likeness, email_query, play_music, audio_volume_mute, social_post, alarm_set, qa_factoid, calendar_set, play_game, alarm_remove, lists_remove, transport_taxi, recommendation_movies, iot_coffee, music_query, play_podcasts, lists_query
- text: 字符串类型
- annot_utt: 字符串类型
- worker_id: 字符串类型
- slot_method: 序列类型,包含:
- slot: 字符串类型
- method: 字符串类型
- judgments: 序列类型,包含:
- worker_id: 字符串类型
- intent_score: 8位整数类型
- slots_score: 8位整数类型
- grammar_score: 8位整数类型
- spelling_score: 8位整数类型
- language_identification: 字符串类型
- label_name: 字符串类型
- label: 64位整数类型
数据集分割
- train: 503个样本,占用147499字节
- validation: 82个样本,占用25026字节
- test: 124个样本,占用36859字节
数据集大小
- 下载大小: 69773字节
- 数据集总大小: 209384字节



