five

crodri/autotrain-data-massive-4-catalan

收藏
Hugging Face2022-12-13 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/crodri/autotrain-data-massive-4-catalan
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-classification --- # AutoTrain Dataset for project: massive-4-catalan ## Dataset Description This dataset has been automatically processed by AutoTrain for project massive-4-catalan. ### Languages The BCP-47 code for the dataset's language is unk. ## Dataset Structure ### Data Instances A sample from this dataset looks as follows: ```json [ { "feat_id": "1", "feat_locale": "ca-ES", "feat_partition": "train", "feat_scenario": 0, "target": 2, "text": "desperta'm a les nou a. m. del divendres", "feat_annot_utt": "desperta'm a les [time : nou a. m.] del [date : divendres]", "feat_worker_id": "42", "feat_slot_method.slot": [ "time", "date" ], "feat_slot_method.method": [ "translation", "translation" ], "feat_judgments.worker_id": [ "42", "30", "3" ], "feat_judgments.intent_score": [ 1, 1, 1 ], "feat_judgments.slots_score": [ 1, 1, 1 ], "feat_judgments.grammar_score": [ 4, 3, 4 ], "feat_judgments.spelling_score": [ 2, 2, 2 ], "feat_judgments.language_identification": [ "target", "target|english", "target" ] }, { "feat_id": "2", "feat_locale": "ca-ES", "feat_partition": "train", "feat_scenario": 0, "target": 2, "text": "posa una alarma per d\u2019aqu\u00ed a dues hores", "feat_annot_utt": "posa una alarma per [time : d\u2019aqu\u00ed a dues hores]", "feat_worker_id": "15", "feat_slot_method.slot": [ "time" ], "feat_slot_method.method": [ "translation" ], "feat_judgments.worker_id": [ "42", "30", "24" ], "feat_judgments.intent_score": [ 1, 1, 1 ], "feat_judgments.slots_score": [ 1, 1, 1 ], "feat_judgments.grammar_score": [ 4, 4, 4 ], "feat_judgments.spelling_score": [ 2, 2, 2 ], "feat_judgments.language_identification": [ "target", "target", "target" ] } ] ``` ### Dataset Fields The dataset has the following fields (also called "features"): ```json { "feat_id": "Value(dtype='string', id=None)", "feat_locale": "Value(dtype='string', id=None)", "feat_partition": "Value(dtype='string', id=None)", "feat_scenario": "ClassLabel(num_classes=18, names=['alarm', 'audio', 'calendar', 'cooking', 'datetime', 'email', 'general', 'iot', 'lists', 'music', 'news', 'play', 'qa', 'recommendation', 'social', 'takeaway', 'transport', 'weather'], id=None)", "target": "ClassLabel(num_classes=60, names=['alarm_query', 'alarm_remove', 'alarm_set', 'audio_volume_down', 'audio_volume_mute', 'audio_volume_other', 'audio_volume_up', 'calendar_query', 'calendar_remove', 'calendar_set', 'cooking_query', 'cooking_recipe', 'datetime_convert', 'datetime_query', 'email_addcontact', 'email_query', 'email_querycontact', 'email_sendemail', 'general_greet', 'general_joke', 'general_quirky', 'iot_cleaning', 'iot_coffee', 'iot_hue_lightchange', 'iot_hue_lightdim', 'iot_hue_lightoff', 'iot_hue_lighton', 'iot_hue_lightup', 'iot_wemo_off', 'iot_wemo_on', 'lists_createoradd', 'lists_query', 'lists_remove', 'music_dislikeness', 'music_likeness', 'music_query', 'music_settings', 'news_query', 'play_audiobook', 'play_game', 'play_music', 'play_podcasts', 'play_radio', 'qa_currency', 'qa_definition', 'qa_factoid', 'qa_maths', 'qa_stock', 'recommendation_events', 'recommendation_locations', 'recommendation_movies', 'social_post', 'social_query', 'takeaway_order', 'takeaway_query', 'transport_query', 'transport_taxi', 'transport_ticket', 'transport_traffic', 'weather_query'], id=None)", "text": "Value(dtype='string', id=None)", "feat_annot_utt": "Value(dtype='string', id=None)", "feat_worker_id": "Value(dtype='string', id=None)", "feat_slot_method.slot": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)", "feat_slot_method.method": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)", "feat_judgments.worker_id": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)", "feat_judgments.intent_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.slots_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.grammar_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.spelling_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.language_identification": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)" } ``` ### Dataset Splits This dataset is split into a train and validation split. The split sizes are as follow: | Split name | Num samples | | ------------ | ------------------- | | train | 11514 | | valid | 2033 |
提供机构:
crodri
原始信息汇总

数据集概述

数据集名称

  • AutoTrain Dataset for project: massive-4-catalan

任务类别

  • text-classification

语言信息

  • BCP-47代码:unk

数据集结构

数据实例
  • 示例数据包含多个字段,如feat_id, feat_locale, feat_partition等。
  • 文本示例:"despertam a les nou a. m. del divendres"
数据集字段
  • feat_id: 字符串类型
  • feat_locale: 字符串类型
  • feat_partition: 字符串类型
  • feat_scenario: 分类标签,18个类别
  • target: 分类标签,60个类别
  • text: 字符串类型
  • feat_annot_utt: 字符串类型
  • feat_worker_id: 字符串类型
  • feat_slot_method.slot: 序列类型,字符串特征
  • feat_slot_method.method: 序列类型,字符串特征
  • feat_judgments.worker_id: 序列类型,字符串特征
  • feat_judgments.intent_score: 序列类型,整数特征
  • feat_judgments.slots_score: 序列类型,整数特征
  • feat_judgments.grammar_score: 序列类型,整数特征
  • feat_judgments.spelling_score: 序列类型,整数特征
  • feat_judgments.language_identification: 序列类型,字符串特征

数据集分割

  • 训练集:11514样本
  • 验证集:2033样本
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作