TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets (Part 11, Jan 2022 - Aug 2022)
收藏CESSDA2024-10-19 更新2024-08-10 收录
下载链接:
https://datacatalogue.cessda.eu/detail?lang=en&q=44aa7217c8e01e070a24762a4179f5d0e0342c7d6c70c36ac2a9e968e1650300
下载链接
链接失效反馈官方服务:
资源简介:
TweetsKB is a public RDF corpus of anonymized data for a large collection of annotated tweets. The dataset currently contains data for nearly 3.0 billion tweets, spanning more than 9 years (February 2013 - August 2022). Metadata information about the tweets as well as extracted entities, sentiments, hashtags, user mentions and URLs are exposed in RDF using established RDF/S vocabularies. For the sake of privacy, we anonymize user IDs and we do not provide the text of the tweets. For a list of the previous dataset parts, example queries and more information see the TweetsKB's home page: https://data.gesis.org/tweetskb/.
TweetsKB是一款面向大规模标注推文集合的公开资源描述框架(RDF)语料库,其收录的数据均经过匿名化处理。当前该数据集包含近30亿条推文的相关数据,时间跨度超过9年(2013年2月至2022年8月)。数据集采用业界通用的RDF(S)词汇表,将推文元数据、抽取得到的实体、情感信息、话题标签、用户提及内容与统一资源定位符(URL)均以RDF格式对外披露。出于隐私保护的考量,研究团队已对用户ID进行匿名化处理,且不提供推文的原始文本内容。如需查看过往数据集分册列表、示例查询语句及更多相关信息,请访问TweetsKB官方主页:https://data.gesis.org/tweetskb/
提供机构:
GESIS Data Archive for the Social Sciences



