ERC Socsemics – Reddit - worldnews - 2013-2017 - news headers
收藏DataCite Commons2025-07-05 更新2025-04-16 收录
下载链接:
https://nakala.fr/10.34847/nkl.c6a63jf8
下载链接
链接失效反馈官方服务:
资源简介:
==Context==
This dataset consists of the headers of all the news articles posted on the r/worldnews sub- reddit between January 1, 2013 and August 1, 2017. The data was collected using the now discontinued “Pushshift Reddit API”, which used to allow for the retrieval of posts from any time period.
The r/worldnews subreddit is a very large and active one, focusing on major news from around the world except US-internal news / US politics. This dataset contains close to half a million news headers (479836).
==Dataset scope==
This dataset contains the news headers and their respective semantic hypergraph (SH) representations, as well as their classifications as claims or conflicts and associated extracted information, both in natural text and SH format. Both the SH representation as well as the claim/conflict detection and information extraction tasks are fully detailed in the main associated publication of this dataset. The SH representation provided is compatible and can be used with the “Graphbrain” open source tool, which can be found at:
https://graphbrain.net
==Archive content==
The dataset consists of the single file “reddit-worldnews-01012013-01082017.csv” with the follwing columns:
• header: news header in plain text.
• hyperedge: news header in SH representation.
• is_claim / is_conflict: boolean fields indicating classification as claim / conflict.
• claim_actor / claim_actor_hyperedge: actor making the claim in plain text and SH representa- tion.
• claim_topic / claim_topic_hyperedge: claim topic in plain text and SH representation.
• conflict_origin / conflict_origin_hyperedge: origin of conflict in plain text and SH representation.
• conflict_target / conflict_target_hyperedge: target of conflict in plain text and SH representa- tion.
• conflict_topic / conflict_topic_hyperedge: conflict topic in plain text and SH representation.
==Acknowledgment of funding==
This dataset has been assembled in the framework of the ERC-supported Consolidator Grant “Socsemics” (research performed at CNRS, 2019-24), grant agreement #772743.
提供机构:
NAKALA - https://nakala.fr (Huma-Num - CNRS)
创建时间:
2024-10-30



