five

DankMemes Task A Dataset

收藏
DataCite Commons2022-06-01 更新2024-07-13 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/8088
下载链接
链接失效反馈
官方服务:
资源简介:
The DANKMEMES Task A Dataset is comprised of 2,000 images, half memes and half not, automatically extracted from Instagram through a Python script aimed at the hashtag related to the Italian government crisis (“#crisidigoverno”). It was created and used in the context of the DankMemes (https://dankmemes2020.fileli.unipi.it), a shared task proposed for the 2020 EVALITA campaign (http://www.evalita.it/2020), focusing on the automatic classification of Internet memes.<p>The dataset is split into training and test sets, in a proportion of 80-20% of items. The test dataset has been provided without gold labels, i.e. without the “Meme” attribute; the gold labels are provided in a separate file.<p>The dataset consists of:<p>- a folder with images in .jpg format <p>- a .csv file with the associated image embeddigs, computed employing ResNet (He et al., 2016), a state-of-the-art model for image recognition based on Deep Residual Learning<p>- a .csv file with the associated variables.<p>The variables provided for this task are:<p><p>- File: the name of the image file associated with the variables;<p>- Engagement: the number of comments and likes of the image;<p>- Date: when the image has first been posted on Instagram;<p>- Picture manipulation: entails the degree of visual modification of the images. Non-manipulated or low impact changes are labeled 0 (e.g. addition of text, or logo). Heavily manipulated, impactful changes (e.g. images altered to include political actors) are labeled 1;<p>- Visual actors: the political actors (i.e. politicians, parties’ logos) portrayed visually, as edited into the picture or portrayed in the original image;<p>- Text: the textual content of the image has been extracted through optical character recognition (OCR) using Google’s Tesseract-OCR Engine, and further manually corrected;<p>- Meme: binary feature, where 0 represents non meme images and 1 meme images. This is the target label for the first subtask.<p>
提供机构:
ELG
创建时间:
2022-06-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作