five

Aakash941/MIMIC-Meme-Dataset

收藏
Hugging Face2024-04-05 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/Aakash941/MIMIC-Meme-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - feature-extraction language: - hi - en pretty_name: >- Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language size_categories: - 1K<n<10K --- This dataset endeavors to fill the research void by presenting a meticulously curated collection of misogynistic memes in a code-mixed language of Hindi and English. It introduces two sub-tasks: the first entails a binary classification to determine the presence of misogyny in a meme, while the second task involves categorizing the misogynistic memes into multiple labels, including Objectification, Prejudice, and Humiliation. For more Information and Citation: Singh, A., Sharma, D., & Singh, V. K. (2024). MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language. ACM Transactions on Asian and Low-Resource Language Information Processing. (https://doi.org/10.1145/3656169) The ZIP folder comprises a CSV file labeled "MIMIC2024" and a directory named "Files" Within the "Files" directory, the memes are stored in JPEG format, while the CSV file contains annotation details for each meme. The CSV file consists of six columns, each described as follows: FileName: Name of the meme in side Files folder ExtractedText: Text extracted from the meme using EasyORC Misogyny: To be used for binary classification task (1-Misogynistic, 0-Non Misogynistic) Objectification, Prejudice, Humiliation: To be used for multi-label classification purposes, indicates the category or categories in which a misogynistic meme falls. (Note: A meme may have multiple labels.)
提供机构:
Aakash941
原始信息汇总

数据集概述

基本信息

  • 许可证: cc-by-4.0
  • 任务类别: 特征提取
  • 语言: 印地语, 英语
  • 数据集名称: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language
  • 数据集大小: 1K<n<10K

数据集内容

  • 数据集目的: 提供一个精心策划的包含性别歧视的印地语-英语混合语言表情包集合,旨在填补研究空白。
  • 子任务:
    • 子任务1: 二元分类任务,判断表情包中是否存在性别歧视。
    • 子任务2: 多标签分类任务,将性别歧视表情包分类为物化、偏见、羞辱等类别。
  • 数据集结构:
    • 文件: 包含一个名为"MIMIC2024"的CSV文件和一个名为"Files"的目录。
    • 文件内容:
      • Files目录: 存储JPEG格式的表情包。
      • CSV文件: 包含表情包的注释细节,共有六个列:
        • FileName: 表情包在Files目录中的名称。
        • ExtractedText: 使用EasyORC从表情包中提取的文本。
        • Misogyny: 用于二元分类任务(1-性别歧视,0-非性别歧视)。
        • Objectification, Prejudice, Humiliation: 用于多标签分类,指示性别歧视表情包所属的类别。

引用信息

  • 论文: Singh, A., Sharma, D., & Singh, V. K. (2024). MIMIC: Misogyny Identification in Multimodal Internet Content in Hindi-English Code-Mixed Language. ACM Transactions on Asian and Low-Resource Language Information Processing.
  • DOI: 10.1145/3656169
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作