five

Dataset Costumer Review Indonesia

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/74xrbd4vxy
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 203,786 Indonesian-language customer reviews collected from a publicly accessible e-commerce platform, covering 4,422 unique product items. The corpus reflects authentic, short-form, and informal user-generated content typical of digital marketplace interactions. The dataset is released in two structured versions: 1. Raw dataset, preserving the original review text and associated metadata, including product identifier, rating (1–5), purchase date, and client platform (Android app, desktop, mobile app). 2. Processed dataset, containing cleaned review text along with additional derived features, including sentiment label (positive, neutral, negative) and sentiment confidence score generated using a pretrained IndoBERT-based sentiment classification model. The processed version applies minimal text cleaning while preserving linguistic authenticity. No synthetic data were introduced. The sentiment annotations are provided to facilitate benchmarking and downstream NLP research rather than to present a new modeling contribution. Descriptive analyses indicate that the corpus exhibits brevity-driven structure, lexical variation, and measurable vocabulary coverage gaps relative to IndoBERT vocabulary, reflecting realistic informal Indonesian digital communication patterns. The dataset is suitable for sentiment analysis, short-text classification, opinion mining, and robustness evaluation in low-resource language settings. The dataset is released to promote reproducible research and to expand publicly available resources for Indonesian natural language processing.
创建时间:
2026-02-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作