crodri/autotrain-data-massive-4-catalan

Name: crodri/autotrain-data-massive-4-catalan
Creator: crodri
Published: 2022-12-13 11:51:02
License: 暂无描述

Hugging Face2022-12-13 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/crodri/autotrain-data-massive-4-catalan

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - text-classification --- # AutoTrain Dataset for project: massive-4-catalan ## Dataset Description This dataset has been automatically processed by AutoTrain for project massive-4-catalan. ### Languages The BCP-47 code for the dataset's language is unk. ## Dataset Structure ### Data Instances A sample from this dataset looks as follows: ```json [ { "feat_id": "1", "feat_locale": "ca-ES", "feat_partition": "train", "feat_scenario": 0, "target": 2, "text": "desperta'm a les nou a. m. del divendres", "feat_annot_utt": "desperta'm a les [time : nou a. m.] del [date : divendres]", "feat_worker_id": "42", "feat_slot_method.slot": [ "time", "date" ], "feat_slot_method.method": [ "translation", "translation" ], "feat_judgments.worker_id": [ "42", "30", "3" ], "feat_judgments.intent_score": [ 1, 1, 1 ], "feat_judgments.slots_score": [ 1, 1, 1 ], "feat_judgments.grammar_score": [ 4, 3, 4 ], "feat_judgments.spelling_score": [ 2, 2, 2 ], "feat_judgments.language_identification": [ "target", "target|english", "target" ] }, { "feat_id": "2", "feat_locale": "ca-ES", "feat_partition": "train", "feat_scenario": 0, "target": 2, "text": "posa una alarma per d\u2019aqu\u00ed a dues hores", "feat_annot_utt": "posa una alarma per [time : d\u2019aqu\u00ed a dues hores]", "feat_worker_id": "15", "feat_slot_method.slot": [ "time" ], "feat_slot_method.method": [ "translation" ], "feat_judgments.worker_id": [ "42", "30", "24" ], "feat_judgments.intent_score": [ 1, 1, 1 ], "feat_judgments.slots_score": [ 1, 1, 1 ], "feat_judgments.grammar_score": [ 4, 4, 4 ], "feat_judgments.spelling_score": [ 2, 2, 2 ], "feat_judgments.language_identification": [ "target", "target", "target" ] } ] ``` ### Dataset Fields The dataset has the following fields (also called "features"): ```json { "feat_id": "Value(dtype='string', id=None)", "feat_locale": "Value(dtype='string', id=None)", "feat_partition": "Value(dtype='string', id=None)", "feat_scenario": "ClassLabel(num_classes=18, names=['alarm', 'audio', 'calendar', 'cooking', 'datetime', 'email', 'general', 'iot', 'lists', 'music', 'news', 'play', 'qa', 'recommendation', 'social', 'takeaway', 'transport', 'weather'], id=None)", "target": "ClassLabel(num_classes=60, names=['alarm_query', 'alarm_remove', 'alarm_set', 'audio_volume_down', 'audio_volume_mute', 'audio_volume_other', 'audio_volume_up', 'calendar_query', 'calendar_remove', 'calendar_set', 'cooking_query', 'cooking_recipe', 'datetime_convert', 'datetime_query', 'email_addcontact', 'email_query', 'email_querycontact', 'email_sendemail', 'general_greet', 'general_joke', 'general_quirky', 'iot_cleaning', 'iot_coffee', 'iot_hue_lightchange', 'iot_hue_lightdim', 'iot_hue_lightoff', 'iot_hue_lighton', 'iot_hue_lightup', 'iot_wemo_off', 'iot_wemo_on', 'lists_createoradd', 'lists_query', 'lists_remove', 'music_dislikeness', 'music_likeness', 'music_query', 'music_settings', 'news_query', 'play_audiobook', 'play_game', 'play_music', 'play_podcasts', 'play_radio', 'qa_currency', 'qa_definition', 'qa_factoid', 'qa_maths', 'qa_stock', 'recommendation_events', 'recommendation_locations', 'recommendation_movies', 'social_post', 'social_query', 'takeaway_order', 'takeaway_query', 'transport_query', 'transport_taxi', 'transport_ticket', 'transport_traffic', 'weather_query'], id=None)", "text": "Value(dtype='string', id=None)", "feat_annot_utt": "Value(dtype='string', id=None)", "feat_worker_id": "Value(dtype='string', id=None)", "feat_slot_method.slot": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)", "feat_slot_method.method": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)", "feat_judgments.worker_id": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)", "feat_judgments.intent_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.slots_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.grammar_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.spelling_score": "Sequence(feature=Value(dtype='int8', id=None), length=-1, id=None)", "feat_judgments.language_identification": "Sequence(feature=Value(dtype='string', id=None), length=-1, id=None)" } ``` ### Dataset Splits This dataset is split into a train and validation split. The split sizes are as follow: | Split name | Num samples | | ------------ | ------------------- | | train | 11514 | | valid | 2033 |

提供机构：

crodri

原始信息汇总

数据集概述

数据集名称

AutoTrain Dataset for project: massive-4-catalan

任务类别

text-classification

语言信息

BCP-47代码：unk

数据集结构

数据实例

示例数据包含多个字段，如feat_id, feat_locale, feat_partition等。
文本示例："despertam a les nou a. m. del divendres"

数据集字段

feat_id: 字符串类型
feat_locale: 字符串类型
feat_partition: 字符串类型
feat_scenario: 分类标签，18个类别
target: 分类标签，60个类别
text: 字符串类型
feat_annot_utt: 字符串类型
feat_worker_id: 字符串类型
feat_slot_method.slot: 序列类型，字符串特征
feat_slot_method.method: 序列类型，字符串特征
feat_judgments.worker_id: 序列类型，字符串特征
feat_judgments.intent_score: 序列类型，整数特征
feat_judgments.slots_score: 序列类型，整数特征
feat_judgments.grammar_score: 序列类型，整数特征
feat_judgments.spelling_score: 序列类型，整数特征
feat_judgments.language_identification: 序列类型，字符串特征

数据集分割

训练集：11514样本
验证集：2033样本

5,000+

优质数据集

54 个

任务类型

进入经典数据集