Xhaheen/Alpaca_urdu_2024_1

Name: Xhaheen/Alpaca_urdu_2024_1
Creator: Xhaheen
Published: 2024-03-06 07:44:23
License: 暂无描述

Hugging Face2024-03-06 更新2024-06-22 收录

下载链接：

https://hf-mirror.com/datasets/Xhaheen/Alpaca_urdu_2024_1

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: input dtype: string - name: output dtype: string - name: answer_lengths dtype: 'null' splits: - name: train num_bytes: 51251741 num_examples: 45622 download_size: 24545189 dataset_size: 51251741 configs: - config_name: default data_files: - split: train path: data/train-* license: apache-2.0 task_categories: - text-generation language: - ur size_categories: - 10K<n<100K --- Description The Alpaca Urdu 🦙 is a translation of the original dataset into Urdu. This dataset is a part of the Alpaca project and is designed for NLP tasks. 🌐 Dataset Information Size: The translated dataset contains [45,000] samples. Languages: Urdu License: [cc-by-4.0] Original Dataset: Alpaca Cleaned dataset Columns The translated dataset includes the following columns: input: input text in Urdu. output: translated output in Urdu. answer_lengths: Lengths of the answers. Example Usage from datasets import load_dataset # Load the translated dataset dataset = load_dataset("Xhaheen/Alpaca_urdu_2024_1") # Access a sample sample = dataset["train"][0] print(sample) ############## import pandas as pd # Assuming the dataset has a key named "train" containing the data df = pd.DataFrame(dataset["train"]) # Save the DataFrame to a CSV file named "alpaca_ur.csv" df.to_csv("alpaca_urdu.csv", index=False)

提供机构：

Xhaheen

原始信息汇总

数据集概述

基本信息

数据集名称: Alpaca Urdu 🦙
语言: Urdu
许可证: Apache-2.0
任务类别: 文本生成
大小类别: 10K<n<100K

数据结构

特征:
- input: 输入文本，数据类型为字符串
- output: 输出文本，数据类型为字符串
- answer_lengths: 答案长度，数据类型为空

数据分割

训练集:
- 文件名: data/train-*
- 样本数量: 45622
- 字节数: 51251741

下载信息

下载大小: 24545189
数据集大小: 51251741

示例用法

python from datasets import load_dataset

加载翻译后的数据集

dataset = load_dataset("Xhaheen/Alpaca_urdu_2024_1")

访问样本

sample = dataset["train"][0] print(sample)

5,000+

优质数据集

54 个

任务类型

进入经典数据集