e-palmisano/italian_dataset_mix

Name: e-palmisano/italian_dataset_mix
Creator: e-palmisano
Published: 2024-06-03 10:49:21
License: 暂无描述

Hugging Face2024-06-03 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/e-palmisano/italian_dataset_mix

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: messages list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 867286323.2730248 num_examples: 314200 - name: test num_bytes: 96827212.50621584 num_examples: 34916 download_size: 503829455 dataset_size: 964113535.7792406 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* license: apache-2.0 task_categories: - question-answering - text-generation language: - it size_categories: - 100K<n<1M --- # Dataset Card for Dataset Name  This dataset represents a collection of the most downloaded Italian datasets. ## Dataset Details ### Dataset Description  This dataset represents a collection of the most downloaded Italian datasets: - WasamiKirua/samantha-ita - mii-community/ultrafeedback-translated-ita - mchl-labs/stambecco_data_it - efederici/fisica - FreedomIntelligence/sharegpt-italian - **Curated by:** Enzo Palmisano - **Language(s) (NLP):** Italian - **License:** Apache 2.0

This dataset represents a collection of the most downloaded Italian datasets, primarily used for question-answering and text-generation tasks. It includes multiple subsets such as WasamiKirua/samantha-ita, mii-community/ultrafeedback-translated-ita, etc. The dataset is divided into a training set with 314200 samples and a test set with 34916 samples. The features of the dataset include message content and role, both of which are string types. The dataset is licensed under Apache 2.0.

提供机构：

e-palmisano

原始信息汇总

数据集详情

特征

messages
- content: 数据类型为字符串
- role: 数据类型为字符串

数据分割

train
- 字节数: 867286323.2730248
- 样本数: 314200
test
- 字节数: 96827212.50621584
- 样本数: 34916

大小

下载大小: 503829455
数据集大小: 964113535.7792406

配置

default
- train: data/train-*
- test: data/test-*

许可证

Apache 2.0

任务类别

问答
文本生成

语言

意大利语

大小类别

100K<n<1M

5,000+

优质数据集

54 个

任务类型

进入经典数据集