benayas/massive_llm_v3
收藏Hugging Face2023-11-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/benayas/massive_llm_v3
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: locale
dtype: string
- name: partition
dtype: string
- name: scenario
dtype:
class_label:
names:
'0': social
'1': transport
'2': calendar
'3': play
'4': news
'5': datetime
'6': recommendation
'7': email
'8': iot
'9': general
'10': audio
'11': lists
'12': qa
'13': cooking
'14': takeaway
'15': music
'16': alarm
'17': weather
- name: intent
dtype:
class_label:
names:
'0': datetime_query
'1': iot_hue_lightchange
'2': transport_ticket
'3': takeaway_query
'4': qa_stock
'5': general_greet
'6': recommendation_events
'7': music_dislikeness
'8': iot_wemo_off
'9': cooking_recipe
'10': qa_currency
'11': transport_traffic
'12': general_quirky
'13': weather_query
'14': audio_volume_up
'15': email_addcontact
'16': takeaway_order
'17': email_querycontact
'18': iot_hue_lightup
'19': recommendation_locations
'20': play_audiobook
'21': lists_createoradd
'22': news_query
'23': alarm_query
'24': iot_wemo_on
'25': general_joke
'26': qa_definition
'27': social_query
'28': music_settings
'29': audio_volume_other
'30': calendar_remove
'31': iot_hue_lightdim
'32': calendar_query
'33': email_sendemail
'34': iot_cleaning
'35': audio_volume_down
'36': play_radio
'37': cooking_query
'38': datetime_convert
'39': qa_maths
'40': iot_hue_lightoff
'41': iot_hue_lighton
'42': transport_query
'43': music_likeness
'44': email_query
'45': play_music
'46': audio_volume_mute
'47': social_post
'48': alarm_set
'49': qa_factoid
'50': calendar_set
'51': play_game
'52': alarm_remove
'53': lists_remove
'54': transport_taxi
'55': recommendation_movies
'56': iot_coffee
'57': music_query
'58': play_podcasts
'59': lists_query
- name: utt
dtype: string
- name: annot_utt
dtype: string
- name: worker_id
dtype: string
- name: slot_method
sequence:
- name: slot
dtype: string
- name: method
dtype: string
- name: judgments
sequence:
- name: worker_id
dtype: string
- name: intent_score
dtype: int8
- name: slots_score
dtype: int8
- name: grammar_score
dtype: int8
- name: spelling_score
dtype: int8
- name: language_identification
dtype: string
- name: category
dtype: string
- name: text
dtype: string
splits:
- name: train
num_bytes: 17839343
num_examples: 11514
- name: validation
num_bytes: 3144099
num_examples: 2033
- name: test
num_bytes: 4598528
num_examples: 2974
download_size: 2975271
dataset_size: 25581970
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
提供机构:
benayas
原始信息汇总
数据集概述
数据集特征
- id: 字符串类型
- locale: 字符串类型
- partition: 字符串类型
- scenario: 分类标签类型,包含以下类别:
- 0: social
- 1: transport
- 2: calendar
- 3: play
- 4: news
- 5: datetime
- 6: recommendation
- 7: email
- 8: iot
- 9: general
- 10: audio
- 11: lists
- 12: qa
- 13: cooking
- 14: takeaway
- 15: music
- 16: alarm
- 17: weather
- intent: 分类标签类型,包含以下类别:
- 0: datetime_query
- 1: iot_hue_lightchange
- 2: transport_ticket
- 3: takeaway_query
- 4: qa_stock
- 5: general_greet
- 6: recommendation_events
- 7: music_dislikeness
- 8: iot_wemo_off
- 9: cooking_recipe
- 10: qa_currency
- 11: transport_traffic
- 12: general_quirky
- 13: weather_query
- 14: audio_volume_up
- 15: email_addcontact
- 16: takeaway_order
- 17: email_querycontact
- 18: iot_hue_lightup
- 19: recommendation_locations
- 20: play_audiobook
- 21: lists_createoradd
- 22: news_query
- 23: alarm_query
- 24: iot_wemo_on
- 25: general_joke
- 26: qa_definition
- 27: social_query
- 28: music_settings
- 29: audio_volume_other
- 30: calendar_remove
- 31: iot_hue_lightdim
- 32: calendar_query
- 33: email_sendemail
- 34: iot_cleaning
- 35: audio_volume_down
- 36: play_radio
- 37: cooking_query
- 38: datetime_convert
- 39: qa_maths
- 40: iot_hue_lightoff
- 41: iot_hue_lighton
- 42: transport_query
- 43: music_likeness
- 44: email_query
- 45: play_music
- 46: audio_volume_mute
- 47: social_post
- 48: alarm_set
- 49: qa_factoid
- 50: calendar_set
- 51: play_game
- 52: alarm_remove
- 53: lists_remove
- 54: transport_taxi
- 55: recommendation_movies
- 56: iot_coffee
- 57: music_query
- 58: play_podcasts
- 59: lists_query
- utt: 字符串类型
- annot_utt: 字符串类型
- worker_id: 字符串类型
- slot_method: 序列类型,包含以下字段:
- slot: 字符串类型
- method: 字符串类型
- judgments: 序列类型,包含以下字段:
- worker_id: 字符串类型
- intent_score: 8位整数类型
- slots_score: 8位整数类型
- grammar_score: 8位整数类型
- spelling_score: 8位整数类型
- language_identification: 字符串类型
- category: 字符串类型
- text: 字符串类型
数据集分割
- train: 包含11514个样本,总字节数为17839343
- validation: 包含2033个样本,总字节数为3144099
- test: 包含2974个样本,总字节数为4598528
数据集大小
- 下载大小: 2975271字节
- 数据集总大小: 25581970字节
配置
- default: 包含以下数据文件路径:
- train: data/train-*
- validation: data/validation-*
- test: data/test-*



