EmoTweetID: Indonesian Emotion Tweet Dataset
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/jzgnjsff9f
下载链接
链接失效反馈官方服务:
资源简介:
The EmoTweetID dataset is a publicly available resource of Indonesian tweets collected from X (formerly Twitter) using emotion-related keywords.
The dataset consists of three main components:
1. EmoTweetID-Corpus.csv: 3,126,987 unlabeled tweets for unsupervised tasks such as word embedding construction.
2. EmoTweetID-Lexicon.csv: 2,243 tweets automatically annotated using the Indonesian NRC EmoLex.
3. EmoTweetID-Human.csv: 2,243 tweets manually annotated by three psychology students, with inter-annotator agreement measured using Cohen’s and Fleiss’ Kappa.
Both annotated files (EmoTweetID-Lexicon.csv and EmoTweetID-Human.csv) provide labels following Ekman’s six basic emotions: anger, disgust, fear, joy, sadness, and surprise.
Additionally, two pre-trained word embedding models (Wors2Vec and FastText) trained on the corpus, TweetID-Word2Vec.zip and TweetID-FastText.zip, are provided for various downstream NLP tasks.
All code used to construct the dataset is available in the GitHub repository: https://github.com/ksnugroho/EmoTweetID
This dataset offers a valuable benchmark for affective computing and natural language processing in Indonesian, supporting research in emotion recognition, social media analysis, and the development of empathetic AI systems.
创建时间:
2025-12-04



