Dataset corresponding to the paper "A privacy-preserving approach to identify riot-related footage on social media".
收藏4TU.ResearchData2025-12-29 更新2026-04-23 收录
下载链接:
https://data.4tu.nl/datasets/1d26c310-5a5b-48e7-b72d-5540bd6d0b6e/1
下载链接
链接失效反馈官方服务:
资源简介:
107,674 geolocated visual posts from a social media were collected during and after the 'Nahel Merzouk' riots in the summer 2023 in 7 French cities. These posts were fed to an image-to-text model (BLIP2-OPT-2.7B) to produce textual description of the visual content. This dataset contains those textual descriptions, along with the metadata (date, time, and location). A subset of the posts were also annotated as riot-related or not riot-related to train a BERT model. This subset is also provided in this database (see paper for more details).<br>Tables:<br>1. videos: Contains metadata about each video including location and timestamp information.2. captions: Contains all captions extracted from videos, with frame-level information.3. annotated_captions: Contains a subset of captions that have been manually annotated for riot-related content.4. annotated_videos: Contains manually annotated video-level labels for riot detection.5. split_annotated_videos: Defines the train/test split for annotated videos used in model training and evaluation.<br>
本数据集收录了2023年夏季法国7座城市在“纳赫尔·梅尔祖克(Nahel Merzouk)”骚乱期间及骚乱后,从社交媒体平台采集的107674条带地理位置标注的视觉帖文。上述帖文被输入至图像转文本模型BLIP2-OPT-2.7B,以生成对应视觉内容的文本描述。本数据集包含生成的文本描述,以及帖文的元数据(日期、时间与地理位置信息)。此外,部分帖文被人工标注为“与骚乱相关”或“非骚乱相关”,用于训练BERT(Bidirectional Encoder Representations from Transformers)模型,该标注子集同样收录于本数据库(详细信息请参阅相关论文)。<br>数据集包含以下数据表:<br>1. videos表:存储每条视频的元数据,涵盖地理位置与时间戳信息。2. captions表:存储从视频中提取的全部字幕,包含帧级细节信息。3. annotated_captions表:存储经人工标注的骚乱相关内容字幕子集。4. annotated_videos表:存储用于骚乱检测的人工标注视频级标签。5. split_annotated_videos表:定义了用于模型训练与评估的标注视频集的训练/测试划分方式。
提供机构:
El Khatibi, Naufal; van Galen, Maurits; van Horik, Bryan
创建时间:
2025-12-29



