five

sapienzanlp/it-Magpie-Llama-3.1-Pro-300K-Filtered-easy

收藏
Hugging Face2024-12-05 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sapienzanlp/it-Magpie-Llama-3.1-Pro-300K-Filtered-easy
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en - it size_categories: - 10K<n<100K dataset_info: features: - name: id dtype: int64 - name: text dtype: string - name: uuid dtype: string - name: model dtype: string - name: gen_input_configs struct: - name: input_generator dtype: string - name: pre_query_template dtype: string - name: seed dtype: 'null' - name: temperature dtype: float64 - name: top_p dtype: float64 - name: instruction dtype: string - name: response dtype: string - name: task_category dtype: string - name: other_task_category sequence: string - name: task_category_generator dtype: string - name: difficulty dtype: string - name: intent dtype: string - name: knowledge dtype: string - name: difficulty_generator dtype: string - name: input_quality dtype: string - name: quality_explanation dtype: string - name: quality_generator dtype: string - name: llama_guard_2 dtype: string - name: reward_model dtype: string - name: instruct_reward dtype: float64 - name: min_neighbor_distance dtype: float64 - name: repeat_count dtype: int64 - name: min_similar_uuid dtype: string - name: instruction_length dtype: int64 - name: response_length dtype: int64 - name: language dtype: string - name: instruction_it dtype: string - name: response_it dtype: string - name: messages list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 571262153 num_examples: 59070 download_size: 266524472 dataset_size: 571262153 configs: - config_name: default data_files: - split: train path: data/train-* --- # Dataset Card: Magpie-Llama-3.1-Pro-300K-Filtered (Italian Translation) ## Dataset Summary This dataset is the **Italian translation** of the [Magpie-Llama-3.1-Pro-300K-Filtered](https://huggingface.co/datasets/Magpie-Align/Magpie-Llama-3.1-Pro-300K-Filtered) dataset, designed specifically for **instruction tuning** in Italian. We translated the instances whose 'difficulty' value is 'easy'. The translation was carried out using **TowerInstruct-Mistral-7B-v0.2**. - **Languages**: Italian (translated from English) - **Purpose**: Instruction tuning in Italian - **Train Size**: 59070 New fields added include: ```python { "messages": list # the translated conversation in Italian, presented in chat template format. "instruction_it": str # the translated instruction in Italian. "response_it_": str # the translated response in Italian. } ``` These additions make the dataset suitable for Italian-speaking AI models and multilingual systems requiring instruction following data in Italian. ## Author Alessandro Scirè (scire@babelscape.com)
提供机构:
sapienzanlp
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作