Event Consequence Collections from Wikinews (ECCW)
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4585656
下载链接
链接失效反馈官方服务:
资源简介:
Goodness of causal knowledge “extraction” systems are typically determined by performance over some benchmarks and although quite a few of them already (BECAUSE, SemEval) exist, they’re marred by the following problems: a) focusing mainly on common-sense reasoning tasks b) making restricting assumptions, e.g. causes and effects being word tokens, or trigger verbs etc.
This endeavor seeks to build new benchmarks for causal knowledge extraction which can address at least some limitations of the previous benchmarks, particularly geared towards risk management and event forecasting.
This dataset born of such an endeavor, which is also a result of a collaboration between the International Business Machines Corporation (IBM) and Rensselaer Polytechnic Institute (RPI), seeks to focus on event forecasting tasks and derives solely from Wikinews.
Out of all (10k+) category pages that exist on wikinews, various automated and manual filtering approaches were applied to vet them to a lesser number, then finally combined into 39 significant "event" / category pages. From all these event / category pages, the first / earliest news article is considered the source event and all the following ones are considered consequences. Various negative or non-consequences are also found, these are events which might be topically (or semantically related to original topic of source event or from related set of categories of source event / original category page) related to source event which can serve as related but eventually "non-consequences" for a given source event. Using such a dataset, one can hope to create benchmarks for event forecasting systems or even use this as a benchmark itself. Due to the volume and the sheer number of event-consequence-negative_example subsets possible, this dataset can also be used to create training and testing sets for supervised classifiers which could perform answer simple multiple choice questions geared towards event forecasting, e.g. one such task can be:
Input Event: Massive anti-government protests in Egypt continue into second day, several killed
Choices:
Hosni Mubarak steps down as president of Egypt
Bomb threat on UK–Egypt plane; diverted to Greece
Benin, Nigeria join African Union continental free trade bloc
Gaza Strip reports first swine flu cases
Answer: (1)
or it can also be a question answering task of the form (one correct + one incorrect options):
Input Event: Massive anti-government protests in Egypt continue into second day, several killed
Choices:
Hosni Mubarak steps down as president of Egypt
Bomb threat on UK–Egypt plane; diverted to Greece
Answer: (1)
Included within is the end result of mining almost all of wikinews into a small, succinct and concise dataset filled with significant events and their consequences and some non consequences. Every event, consequence and non-consequence has a number of fields, including but not limited to:
a) categories
b) category_links
c) category_name
d) content
e) date
f) title
g) url
for every wikinews article present. There may be other differing extraction specific metadata fields included as well depending on which category of article (source event, consequences, negative examples / non-consequences) is being considered. There are 39 lines / category pages in the JSONL file with the following being the complete summary for the same:
{"information": "Complete Summary", "total_consequences": 570.0, "mean_consequences": 14.615384615384615, "median_consequences": 6.0, "total_negatives": 780.0, "total_negatives_before": 390.0, "total_negatives_after": 390.0}
Please feel free to contact the following people with any questions or comments:
Oktie Hassanzadeh
hassanzadeh at us.ibm.com
Gaurav Dass
dassg2 at rpi.edu
dassgaurav93 at gmail.com
创建时间:
2021-08-25



