Dataset Costumer Review Indonesia
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/74xrbd4vxy
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 203,786 Indonesian-language customer reviews collected from a publicly accessible e-commerce platform, covering 4,422 unique product items. The corpus reflects authentic, short-form, and informal user-generated content typical of digital marketplace interactions.
The dataset is released in two structured versions:
1. Raw dataset, preserving the original review text and associated metadata, including product identifier, rating (1–5), purchase date, and client platform (Android app, desktop, mobile app).
2. Processed dataset, containing cleaned review text along with additional derived features, including sentiment label (positive, neutral, negative) and sentiment confidence score generated using a pretrained IndoBERT-based sentiment classification model.
The processed version applies minimal text cleaning while preserving linguistic authenticity. No synthetic data were introduced. The sentiment annotations are provided to facilitate benchmarking and downstream NLP research rather than to present a new modeling contribution.
Descriptive analyses indicate that the corpus exhibits brevity-driven structure, lexical variation, and measurable vocabulary coverage gaps relative to IndoBERT vocabulary, reflecting realistic informal Indonesian digital communication patterns. The dataset is suitable for sentiment analysis, short-text classification, opinion mining, and robustness evaluation in low-resource language settings.
The dataset is released to promote reproducible research and to expand publicly available resources for Indonesian natural language processing.
创建时间:
2026-02-26



