five

Roman Urdu Word Variations and Normalized Sentiment Review Dataset (RUWV-NSR)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/v5jfhsvtmd
下载链接
链接失效反馈
官方服务:
资源简介:
We have developed two unique Roman Urdu datasets, translated into English. The first dataset focuses on Roman Urdu words and their spelling variations. This dataset is structured in an Excel file with five columns labeled "Var-1" to "Var-5," each representing up to five variations of Roman Urdu spellings for individual words. The final column, "common," contains the most frequently used spelling for each word. In total, this dataset includes 5,244 unique Roman Urdu words, which, when combined with their variations, amount to 19,527 words. The second dataset contains Roman Urdu reviews, each labeled with a sentiment. Given the variability in Roman Urdu spellings found on the web, where users often create their own spelling variations, we have normalized the spelling of words across these reviews. This dataset is the first of its kind, containing the largest collection of Roman Urdu reviews, with a total of 28,090 reviews categorized into five sentiment classes. This dataset is particularly valuable for analyzing Roman Urdu content in contexts such as online product reviews or Roman Urdu articles, which are becoming increasingly common. It offers significant potential for sentiment analysis and language processing applications.
创建时间:
2024-10-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作