Twemoji Dataset
收藏DataCite Commons2025-05-01 更新2025-04-17 收录
下载链接:
https://uvaauas.figshare.com/articles/Twemoji_Dataset/5822100/3
下载链接
链接失效反馈官方服务:
资源简介:
Collection of 13M tweets divided into training, validation, and test sets for the purposes of predicting emoji based on text and/or images.<br>The data provides the tweet status ID and the emoji annotations associated with it. In the case of image-containing subsets, the image URL is also listed.<br>The Full, unbalanced dataset consists of a random test and validation sets of 1M tweets, with the remainder in the training set.<br>The Balanced testset is a subset of the test set chosen to improve emoji class balance.<br>The Image subsets are image-containing tweets.<br>Finally, emoji_map_1791.csv provides information regarding the emoji labels and potential metadata.
本数据集涵盖1300万条推文,按训练集、验证集与测试集划分,旨在基于文本及/或图像完成推文附带表情符号(emoji)的预测任务。
该数据集包含推文状态ID及其关联的表情符号标注;对于包含图像的子集,还会提供对应的图像URL。
完整非均衡数据集采用随机抽取的100万条推文作为测试集与验证集,剩余推文全部归入训练集。
均衡测试集为从测试集中选取的子集,用于优化表情符号类别的分布均衡性。
图像子集由包含图像的推文组成。
最后,emoji_map_1791.csv 文件提供了表情符号标签及相关元数据信息。
提供机构:
University of Amsterdam / Amsterdam University of Applied Sciences
创建时间:
2018-02-28



