RECOD.ai events dataset
收藏DataCite Commons2025-03-21 更新2025-04-17 收录
下载链接:
https://redu.unicamp.br/citation?persistentId=doi:10.25824/redu/BLIYYR
下载链接
链接失效反馈官方服务:
资源简介:
Overview This data set consists of links to social network items for 34 different forensic events that took place between August 14th, 2018 and January 06th, 2021. The majority of the text and images are from Twitter (a minor part is from Flickr, Facebook and Google+), and every video is from YouTube. Data Collection We used Social Tracker, along with the social medias' APIs, to gather most of the collections. For a minor part, we used Twint. In both cases, we provided keywords related to the event to receive the data. It is important to mention that, in procedures like this one, usually only a small fraction of the collected data is in fact related to the event and useful for a further forensic analysis. Content We have data from 34 events, and for each of them we provide the files: items_full.csv: It contains links to any social media post that was collected. images.csv: Enlists the images collected. In some files there is a field called "ItemUrl", that refers to the social network post (e.g., a tweet) that mentions that media. video.csv: Urls of YouTube videos that were gathered about the event. video_tweet.csv: This file contains IDs of tweets and IDs of YouTube videos. A tweet whose ID is in this file has a video in its content. In turn, the link of a Youtube video whose ID is in this file was mentioned by at least one collected tweet. Only two collections have this file. description.txt: Contains some standard information about the event, and possibly some comments about any specific issue related to it. In fact, most of the collections do not have all the files above. Such an issue is due to changes in our collection procedure throughout the time of this work. Events We divided the events into six groups. They are: Fire: Devastating fire is the main issue of the event, therefore most of the informative pictures show flames or burned constructions. 14 Events Collapse: Most of the relevant images depict collapsed buildings, bridges, etc. (not caused by fire). 5 Events Shooting: Likely images of guns and police officers. Few or no destruction of the environment. 5 Events Demonstration: Plethora of people on the streets. Possibly some problem took place on that, but in most cases the demonstration is the actual event. 7 Events Collision: Traffic collision. Pictures of damaged vehicles on an urban landscape. Possibly there are images with victims on the street. 1 Event Flood: Events that range from fierce rain to a tsunami. Many pictures depict water. 2 Events Media Content Due to the terms of use from the social networks, we do not make publicly available the texts, images and videos that were collected. However, we can provide some extra piece of media content related to one (or more) events by contacting the authors.
数据集概览
本数据集收录了2018年8月14日至2021年1月6日期间发生的34起法医事件的社交媒体内容链接。其中绝大多数文本与图像素材源自Twitter,少量来自Flickr、Facebook及Google+,所有采集到的视频均来源于YouTube。
数据采集
我们主要通过Social Tracker及各社交媒体官方应用程序编程接口(API)完成数据收集,小部分数据借助Twint工具获取。两种采集方式均通过输入与目标事件相关的关键词来获取数据。需特别说明的是,在此类数据采集流程中,实际与事件相关、可用于后续法医分析的有效数据仅占采集总量的极小比例。
数据集内容
本数据集涵盖34起事件的相关数据,针对每起事件均提供以下文件:
1. `items_full.csv`:收录所有采集到的社交媒体帖子的链接;
2. `images.csv`:列出本次采集到的图像素材。部分文件中包含名为`ItemUrl`的字段,该字段指向提及对应媒体素材的社交媒体帖子(social media post,例如推文(tweet));
3. `video.csv`:收录采集到的与该事件相关的YouTube视频链接;
4. `video_tweet.csv`:该文件包含推文ID与YouTube视频ID的对应关系。若某推文的ID出现在此文件中,则该推文内容内嵌视频;同理,若某YouTube视频的ID出现在此文件中,则其链接至少被一条采集到的推文提及。仅2个事件的数据集包含该文件;
5. `description.txt`:包含该事件的标准背景信息,以及可能存在的相关特定问题的备注说明。
需注意,多数事件数据集并未包含上述全部文件,这是由于本研究周期内我们持续优化了数据采集流程所致。
事件分类
我们将34起事件划分为六大类别:
1. **火灾类(Fire)**:事件核心为破坏性火灾,多数传递事件信息的图像呈现火焰或被焚毁的建筑结构,共14起事件;
2. **坍塌类(Collapse)**:相关图像多为坍塌的建筑、桥梁等(非火灾引发),共5起事件;
3. **枪击类(Shooting)**:图像多涉及枪支与警务人员,环境破坏极少或无,共5起事件;
4. **示威类(Demonstration)**:场景为街头聚集的大量人群,期间可能发生各类状况,但示威活动本身即为核心事件,共7起事件;
5. **碰撞类(Collision)**:涉及交通事故,图像多为城市场景中受损的车辆,部分图像可能包含街头遇难者,共1起事件;
6. **洪涝类(Flood)**:涵盖强降雨至海啸等各类洪涝相关事件,多数图像呈现水域场景,共2起事件。
媒体内容说明
鉴于各社交媒体平台的使用条款限制,我们无法公开提供本次采集到的文本、图像及视频素材。但通过联系数据集作者,我们可提供与某一起或多起事件相关的额外媒体素材。
提供机构:
Repositório de Dados de Pesquisa da Unicamp
创建时间:
2025-03-21



