benayas/massive_llm_v0
收藏Hugging Face2023-11-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/benayas/massive_llm_v0
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: locale
dtype: string
- name: partition
dtype: string
- name: scenario
dtype:
class_label:
names:
'0': social
'1': transport
'2': calendar
'3': play
'4': news
'5': datetime
'6': recommendation
'7': email
'8': iot
'9': general
'10': audio
'11': lists
'12': qa
'13': cooking
'14': takeaway
'15': music
'16': alarm
'17': weather
- name: intent
dtype:
class_label:
names:
'0': datetime_query
'1': iot_hue_lightchange
'2': transport_ticket
'3': takeaway_query
'4': qa_stock
'5': general_greet
'6': recommendation_events
'7': music_dislikeness
'8': iot_wemo_off
'9': cooking_recipe
'10': qa_currency
'11': transport_traffic
'12': general_quirky
'13': weather_query
'14': audio_volume_up
'15': email_addcontact
'16': takeaway_order
'17': email_querycontact
'18': iot_hue_lightup
'19': recommendation_locations
'20': play_audiobook
'21': lists_createoradd
'22': news_query
'23': alarm_query
'24': iot_wemo_on
'25': general_joke
'26': qa_definition
'27': social_query
'28': music_settings
'29': audio_volume_other
'30': calendar_remove
'31': iot_hue_lightdim
'32': calendar_query
'33': email_sendemail
'34': iot_cleaning
'35': audio_volume_down
'36': play_radio
'37': cooking_query
'38': datetime_convert
'39': qa_maths
'40': iot_hue_lightoff
'41': iot_hue_lighton
'42': transport_query
'43': music_likeness
'44': email_query
'45': play_music
'46': audio_volume_mute
'47': social_post
'48': alarm_set
'49': qa_factoid
'50': calendar_set
'51': play_game
'52': alarm_remove
'53': lists_remove
'54': transport_taxi
'55': recommendation_movies
'56': iot_coffee
'57': music_query
'58': play_podcasts
'59': lists_query
- name: utt
dtype: string
- name: annot_utt
dtype: string
- name: worker_id
dtype: string
- name: slot_method
sequence:
- name: slot
dtype: string
- name: method
dtype: string
- name: judgments
sequence:
- name: worker_id
dtype: string
- name: intent_score
dtype: int8
- name: slots_score
dtype: int8
- name: grammar_score
dtype: int8
- name: spelling_score
dtype: int8
- name: language_identification
dtype: string
- name: category
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 6371399
num_examples: 11514
- name: validation
num_bytes: 1119231
num_examples: 2033
- name: test
num_bytes: 1636424
num_examples: 2974
download_size: 1813395
dataset_size: 9127054
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
提供机构:
benayas
原始信息汇总
数据集信息
特征
- id: 字符串类型
- locale: 字符串类型
- partition: 字符串类型
- scenario: 分类标签类型,包含以下类别:
- social
- transport
- calendar
- play
- news
- datetime
- recommendation
- iot
- general
- audio
- lists
- qa
- cooking
- takeaway
- music
- alarm
- weather
- intent: 分类标签类型,包含以下类别:
- datetime_query
- iot_hue_lightchange
- transport_ticket
- takeaway_query
- qa_stock
- general_greet
- recommendation_events
- music_dislikeness
- iot_wemo_off
- cooking_recipe
- qa_currency
- transport_traffic
- general_quirky
- weather_query
- audio_volume_up
- email_addcontact
- takeaway_order
- email_querycontact
- iot_hue_lightup
- recommendation_locations
- play_audiobook
- lists_createoradd
- news_query
- alarm_query
- iot_wemo_on
- general_joke
- qa_definition
- social_query
- music_settings
- audio_volume_other
- calendar_remove
- iot_hue_lightdim
- calendar_query
- email_sendemail
- iot_cleaning
- audio_volume_down
- play_radio
- cooking_query
- datetime_convert
- qa_maths
- iot_hue_lightoff
- iot_hue_lighton
- transport_query
- music_likeness
- email_query
- play_music
- audio_volume_mute
- social_post
- alarm_set
- qa_factoid
- calendar_set
- play_game
- alarm_remove
- lists_remove
- transport_taxi
- recommendation_movies
- iot_coffee
- music_query
- play_podcasts
- lists_query
- utt: 字符串类型
- annot_utt: 字符串类型
- worker_id: 字符串类型
- slot_method: 序列类型,包含以下子特征:
- slot: 字符串类型
- method: 字符串类型
- judgments: 序列类型,包含以下子特征:
- worker_id: 字符串类型
- intent_score: 8位整数类型
- slots_score: 8位整数类型
- grammar_score: 8位整数类型
- spelling_score: 8位整数类型
- language_identification: 字符串类型
- category: 字符串类型
- text: 字符串类型
数据集划分
- train: 包含11514个样本,6371399字节
- validation: 包含2033个样本,1119231字节
- test: 包含2974个样本,1636424字节
数据集大小
- 下载大小: 1813395字节
- 数据集大小: 9127054字节
配置
- default: 包含以下数据文件路径:
- train: data/train-*
- validation: data/validation-*
- test: data/test-*



