five

PHEME dataset of rumours and non-rumours

收藏
DataCite Commons2025-06-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/PHEME_dataset_of_rumours_and_non-rumours/4010619/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a collection of Twitter rumours and non-rumours posted during breaking news. The five breaking news provided with the dataset are as follows:<br>* Charlie Hebdo: 458 rumours (22.0%) and 1,621 non-rumours (78.0%).* Ferguson: 284 rumours (24.8%) and 859 non-rumours (75.2%).* Germanwings Crash: 238 rumours (50.7%) and 231 non-rumours (49.3%).* Ottawa Shooting: 470 rumours (52.8%) and 420 non-rumours (47.2%).* Sydney Siege: 522 rumours (42.8%) and 699 non-rumours (57.2%).<br>The data is structured as follows. Each event has a directory, with two subfolders, rumours and non-rumours. These two folders have folders named with a tweet ID. The tweet itself can be found on the 'source-tweet' directory of the tweet in question, and the directory 'reactions' has the set of tweets responding to that source tweet.<br>This dataset was used in the paper 'Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media' for rumour detection. For more details, please refer to the paper.<br>License: The annotations are provided under a CC-BY license, while Twitter retains the ownership and rights of the content of the tweets.

本数据集收录了突发新闻时段发布的Twitter平台谣言与非谣言文本。本数据集附带的五起突发新闻事件详情如下:<br>* 查理周刊(Charlie Hebdo)事件:含谣言458条(占比22.0%)、非谣言1621条(占比78.0%)。<br>* 弗格森(Ferguson)事件:含谣言284条(占比24.8%)、非谣言859条(占比75.2%)。<br>* 德国之翼空难(Germanwings Crash):含谣言238条(占比50.7%)、非谣言231条(占比49.3%)。<br>* 渥太华枪击案(Ottawa Shooting):含谣言470条(占比52.8%)、非谣言420条(占比47.2%)。<br>* 悉尼人质事件(Sydney Siege):含谣言522条(占比42.8%)、非谣言699条(占比57.2%)。<br>本数据集的结构如下:每个事件对应一个独立目录,下设`rumours`(谣言)与`non-rumours`(非谣言)两个子目录;两类子目录中均包含以推文ID(Tweet ID)命名的下级子目录。每条源推文的本体可在对应推文的`source-tweet`目录下获取,而`reactions`目录则存储针对该源推文的所有回复推文。<br>本数据集曾被用于论文《面向社交媒体谣言检测的突发新闻报道动态学习》(Learning Reporting Dynamics during Breaking News for Rumour Detection in Social Media)中的谣言检测任务,如需了解更多细节,请参阅该论文。<br>许可协议:标注数据采用知识共享署名(CC-BY)许可协议发布,Twitter保留所有推文内容的所有权及相关权利。
提供机构:
figshare
创建时间:
2016-10-24
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作