sapienzanlp/it-Magpie-Llama-3.1-Pro-300K-Filtered-easy

Name: sapienzanlp/it-Magpie-Llama-3.1-Pro-300K-Filtered-easy
Creator: sapienzanlp
Published: 2024-12-05 11:55:58
License: 暂无描述

Hugging Face2024-12-05 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/sapienzanlp/it-Magpie-Llama-3.1-Pro-300K-Filtered-easy

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en - it size_categories: - 10K<n<100K dataset_info: features: - name: id dtype: int64 - name: text dtype: string - name: uuid dtype: string - name: model dtype: string - name: gen_input_configs struct: - name: input_generator dtype: string - name: pre_query_template dtype: string - name: seed dtype: 'null' - name: temperature dtype: float64 - name: top_p dtype: float64 - name: instruction dtype: string - name: response dtype: string - name: task_category dtype: string - name: other_task_category sequence: string - name: task_category_generator dtype: string - name: difficulty dtype: string - name: intent dtype: string - name: knowledge dtype: string - name: difficulty_generator dtype: string - name: input_quality dtype: string - name: quality_explanation dtype: string - name: quality_generator dtype: string - name: llama_guard_2 dtype: string - name: reward_model dtype: string - name: instruct_reward dtype: float64 - name: min_neighbor_distance dtype: float64 - name: repeat_count dtype: int64 - name: min_similar_uuid dtype: string - name: instruction_length dtype: int64 - name: response_length dtype: int64 - name: language dtype: string - name: instruction_it dtype: string - name: response_it dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 571262153 num_examples: 59070 download_size: 266524472 dataset_size: 571262153 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card: Magpie-Llama-3.1-Pro-300K-Filtered (Italian Translation) ## Dataset Summary This dataset is the **Italian translation** of the [Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) dataset, designed specifically for **instruction tuning** in Italian. We translated the instances whose 'difficulty' value is 'easy'. The translation was carried out using **TowerInstruct-Mistral-7B-v0.2**. - **Languages**: Italian (translated from English) - **Purpose**: Instruction tuning in Italian - **Train Size**: 59070 New fields added include: ```python { "messages": list # the translated conversation in Italian, presented in chat template format. "instruction_it": str # the translated instruction in Italian. "response_it_": str # the translated response in Italian. } ``` These additions make the dataset suitable for Italian-speaking AI models and multilingual systems requiring instruction following data in Italian. ## Author Alessandro Scirè (scire@babelscape.com)

提供机构：

sapienzanlp

5,000+

优质数据集

54 个

任务类型

进入经典数据集