five

COVID-related protest events from the New York Times news corpus

收藏
arXiv2023-01-17 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2301.06617v1
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集源自《纽约时报》新闻语料库,专注于收集与COVID-19相关的抗议事件数据。通过开发系统处理爬取的数据,将其转化为抗议事件数据集。数据集的创建涉及使用预训练的XLMRoBERTa-large模型对新闻进行分类,并通过SpaCy命名实体识别器提取地理位置信息,最终形成包含事件ID、日期、城市、地区或州、国家的数据记录。该数据集主要用于评估事件检测系统在时空模式提取方面的性能,特别是在美国为期三个月的COVID-19抗议事件期间。

This dataset is sourced from The New York Times News Corpus, and is dedicated to collecting data on COVID-19-related protest incidents. A dedicated processing system was developed to handle the crawled data, which was then structured into this protest event dataset. The construction of the dataset involves using the pre-trained XLMRoBERTa-large model to classify news articles, and extracting geographic location information via the SpaCy named entity recognizer, ultimately producing data records that contain event ID, date, city, region or state, and country. This dataset is primarily intended to evaluate the performance of event detection systems in extracting spatiotemporal patterns, specifically during the three-month period of COVID-19-related protest events in the United States.
提供机构:
创建时间:
2023-01-17
二维码
社区交流群
二维码
科研交流群
商业服务