DICE: a Dataset of Italian Crime Event news Dataset
收藏paperswithcode.com2025-01-15 收录
下载链接:
https://paperswithcode.com/dataset/italian-crime-news
下载链接
链接失效反馈官方服务:
资源简介:
The dataset contains the main components of the news articles published online by the newspaper named <a href="https://gazzettadimodena.gelocal.it/modena">Gazzetta di Modena</a>: url of the web page, title, sub-title, text, date of publication, crime category assigned to each news article by the author.
The news articles are written in Italian and describe 11 types of crime events occurred in the province of Modena between the end of 2011 and 2021.
Moreover, the dataset includes data derived from the abovementioned components thanks to the application of Natural Language Processing techniques.
Some examples are the place of the crime event occurrence (municipality, area, address and GPS coordinates), the date of the occurrence, and the type of the crime events described in the news article obtained by an automatic categorization of the text.
In the end, news articles describing the same crime events (duplciates) are detected by calculating the document similarity.
Now, we are working on the application of question answering to extract the 5W+1H and we plan to extend the current dataset with the obtained data.
Other researchers can employ the dataset to apply other algorithms of text categorization and duplicate detection and compare their results with the benchmark. The dataset can be useful for several scopes, e.g., geo-localization of the events, text summarization, crime analysis, crime prediction, community detection, topic modeling.
该数据集囊括了《Gazzetta di Modena》报纸在线发布的新闻文章的主要组成部分,包括网页URL、标题、副标题、正文、发布日期以及作者分配给每篇新闻文章的犯罪类别。新闻文章以意大利语撰写,描述了自2011年末至2021年间在Modena省发生的11种犯罪事件。此外,数据集还包括通过自然语言处理技术从上述组件中提取的数据,例如犯罪事件发生的地点(市镇、区域、地址和GPS坐标)、事件发生日期以及新闻文章中描述的犯罪事件类型,这些类型是通过文本的自动分类获得的。最终,通过计算文档相似度,检测描述相同犯罪事件(重复)的新闻文章。目前,我们正在致力于应用问答系统以提取5W+1H信息,并计划将当前数据集扩展至所得数据。其他研究者可以利用该数据集应用其他文本分类和重复检测算法,并将他们的结果与基准进行比较。该数据集在多个领域具有潜在应用价值,例如事件地理定位、文本摘要、犯罪分析、犯罪预测、社区检测和主题建模。
提供机构:
Papers with Code



