DankMemes Dataset

DataCite Commons2022-06-01 更新2024-07-13 收录

下载链接：

https://live.european-language-grid.eu/catalogue/corpus/8094

下载链接

链接失效反馈

官方服务：

资源简介：

The DANKMEMES Dataset is composed of 2,361 images, half memes and half not, automatically extracted from Instagram through a Python script aimed at the hashtag related to the Italian government crisis (“#crisidigoverno”). It was created and used in the context of the DankMemes (https://dankmemes2020.fileli.unipi.it), a shared task proposed for the 2020 EVALITA campaign (http://www.evalita.it/2020), focusing on the automatic classification of In- ternet memes. The task encompasses three subtasks, aimed at: detecting memes (Task A), detecting the hate speech in memes (Task B) and clustering memes according to events (Task C).The dataset is split into training and test sets, in a proportion of 80-20% of items. The test dataset has been provided without gold labels, provided in a separate file for each subtask.For each subtask, the dataset consists of:a folder with images in .jpg format - a .csv file with the associated image embeddigs, computed employing ResNet (He et al., 2016), a state-of-the-art model for image recognition based on Deep Residual Learning.- a .csv file with the associated variablesThe variables provided are:- File: the name of the image file associated with the variables;- Engagement: the number of comments and likes of the image;- Date: when the image has first been posted on Instagram;- Picture manipulation: entails the degree of visual modification of the images. Non-manipulated or low impact changes are labeled 0 (e.g. addition of text, or logo). Heavily manipulated, impactful changes (e.g. images altered to include political actors) are labeled 1;- Visual actors: the political actors (i.e. politicians, parties’ logos) portrayed visually, as edited into the picture or portrayed in the original image;- Text: the textual content of the image has been extracted through optical character recognition (OCR) using Google’s Tesseract-OCR Engine, and further manually corrected;- (for task A) Meme: binary feature, where 0 represents non meme images and 1 meme images. - (for task B) Hate speech: binary feature only for memes. It differentiates memes with offensive language (1) from non offensive memes (0).- (for task C) Event: feature only for meme images, categorizing them according to 4 events related to the 2019 Italian government crisis

提供机构：

ELG

创建时间：

2022-06-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集