SenTopX
收藏arXiv2024-06-05 更新2024-06-21 收录
下载链接:
https://doi.org/10.5281/zenodo.11243662
下载链接
链接失效反馈官方服务:
资源简介:
SenTopX是由澳大利亚悉尼的麦考瑞大学创建的一个大型纵向Twitter数据集,涵盖了2007年至2021年间来自143,000名用户的293百万条推文。该数据集通过主题建模技术,将用户根据其推文中的主要话题分为八个类别。数据集的创建过程中,每条推文都通过Perspective API进行了毒性评分,涉及16种不同的毒性类别。SenTopX旨在通过纵向分析揭示用户全面的性格特征及其对平台的影响,为研究者提供了深入分析用户行为和平台调节的机会。
SenTopX is a large-scale longitudinal Twitter dataset created by Macquarie University in Sydney, Australia. It encompasses 293 million tweets from 143,000 users spanning the period from 2007 to 2021. This dataset uses topic modeling techniques to categorize users into eight categories based on the main topics of their tweets. During the dataset construction process, each tweet was assigned a toxicity score via the Perspective API, covering 16 distinct toxicity categories. SenTopX aims to reveal users' comprehensive personality traits and their impacts on the platform through longitudinal analysis, providing researchers with opportunities for in-depth analyses of user behavior and platform moderation.
提供机构:
麦考瑞大学
创建时间:
2024-06-05



