COVID-19 Dataset
收藏doi.org2025-03-24 收录
下载链接:
http://doi.org/10.17632/w88vrhrfp7.1
下载链接
链接失效反馈官方服务:
资源简介:
This is some collections of COVID-19 comment from Twitter, YouTube, Facebook, and Instagram in Indonesian language. This dataset has been pre-processing with various stages :
1. Cleansing
2. Case folding
3. Text normalization
4. Stopword removal, and
5. Stemming by Sastrawi
There are two folders in the file, in the form of csv and json. Each of the datasets has been split into train and test data with an 80:20 ratio.
本数据集汇聚了来自 Twitter、YouTube、Facebook 及 Instagram 平台的印尼语 COVID-19 相关评论。数据集经过多阶段预处理,包括:
1. 清洗
2. 案例折叠
3. 文本规范化
4. 停用词去除
5. Sastrawi 方法下的词干提取。文件包含两个文件夹,分别以 csv 和 格式存储。每个数据集均按照 80:20 的比例划分为训练集和测试集。
提供机构:
doi.org



