A Data Quality Multidimensional Model for Social Media Analysis
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10636894
下载链接
链接失效反馈官方服务:
资源简介:
This dataset comprises the data used in the paper for assessing the quality of several metrics in determining the relevance of the users.
The datasets consists of data extracted from Twitter for the automotive domain, where the query consisted in several brands and models of cars. We provide three datasets:
users_all_metrics2.txt
User_id, statuses, listed, friends, followers, tweets on domain (dataset), Screen name, User language, User location, Verified account (True/False), Coherence of profile (entropy of text under domain model), #Performed actions, #Received actions
tweets_all_metrics.txt.gz
Tweet_id, replies, retweets, favourites, User_id, statuses, listed, friends, followers, tweets on domain (dataset), Screen name, User language, User location, Verified account (True/False), Coherence of profile, Date of publication (created_at), Tweet Language, processed text, coherence of text, repetitions of text in collection, user's received actions, user's generated actions, text polarity, number of facts, number of linked opinion expressions, number of linked entities
relevant_new.txt
Screen names of the users deemed relevant for the domain
Datasets are "|"-separeted text files with no header provided (see table above for the name of the columns).
创建时间:
2024-02-27



