community-datasets/roman_urdu

Name: community-datasets/roman_urdu
Creator: community-datasets
Published: 2024-06-24 05:02:09
License: 暂无描述

Hugging Face2024-06-24 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/community-datasets/roman_urdu

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个用于文本分类任务的罗马乌尔都语数据集，包含乌尔都语句子及其对应的情感标签（正面、负面、中性）。数据集的大小在10K到100K之间，且为单语种（乌尔都语）。数据集的创建过程涉及众包注释，但具体的注释过程、数据来源、数据收集和标准化等信息未提供。数据集的引用信息提供了相关论文和UCI机器学习仓库的链接。

This dataset is a Roman Urdu dataset for text classification tasks, containing Urdu sentences and their corresponding sentiment labels (Positive, Negative, Neutral). The dataset size is between 10K and 100K, and it is monolingual (Urdu). The dataset creation process involves crowdsourced annotations, but specific details about the annotation process, data sources, data collection, and normalization are not provided. The citation information includes links to relevant papers and the UCI Machine Learning Repository.

提供机构：

community-datasets

原始信息汇总

Roman Urdu Dataset 数据集概述

数据集描述

数据集摘要

语言: 乌尔都语 (Urdu)
许可: 未知
多语言性: 单语种
大小类别: 10K<n<100K
源数据集: 原始数据
任务类别: 文本分类
任务ID: 情感分类
数据集ID: roman-urdu-data-set
数据集名称: Roman Urdu Dataset

数据集结构

数据实例

Wah je wah,Positive,

数据字段

sentence: 一段乌尔都语文本，数据类型为字符串。
sentiment: 情感标签，数据类型为类别标签，包括 Positive, Negative, 和 Neutral。

数据分割

train: 训练集，包含 20229 个样本，占用 1633411 字节。

数据集创建

数据集信息

特征:
- sentence: 字符串类型
- sentiment: 类别标签类型，包含 Positive, Negative, 和 Neutral
分割:
- train: 包含 20229 个样本，占用 1633411 字节
下载大小: 1060033 字节
数据集大小: 1633411 字节

配置

默认配置:
- 数据文件:
  - train: 路径为 data/train-*

引用信息

@InProceedings{Sharf:2018, title = "Performing Natural Language Processing on Roman Urdu Datasets", authors = "Zareen Sharf and Saif Ur Rahman", booktitle = "International Journal of Computer Science and Network Security", volume = "18", number = "1", pages = "141-148", year = "2018" }

@misc{Dua:2019, author = "Dua, Dheeru and Graff, Casey", year = "2017", title = "{UCI} Machine Learning Repository", url = "http://archive.ics.uci.edu/ml", institution = "University of California, Irvine, School of Information and Computer Sciences" }

5,000+

优质数据集

54 个

任务类型

进入经典数据集