five

mediabiasgroup/mbib-base

收藏
Hugging Face2024-02-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mediabiasgroup/mbib-base
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-nd-4.0 task_categories: - text-classification language: - en tags: - media - mediabias - media-bias - media bias size_categories: - 1M<n<10M dataset_info: config_name: plain_text splits: - name: cognitive_bias - name: fake_news - name: gender_bias - name: hate_speech - name: linguistic_bias - name: political_bias - name: racial_bias - name: text_level_bias configs: - config_name: default data_files: - split: cognitive_bias path: mbib-aggregated/cognitive-bias.csv - split: fake_news path: mbib-aggregated/fake-news.csv - split: gender_bias path: mbib-aggregated/gender-bias.csv - split: hate_speech path: mbib-aggregated/hate-speech.csv - split: linguistic_bias path: mbib-aggregated/linguistic-bias.csv - split: political_bias path: mbib-aggregated/political-bias.csv - split: racial_bias path: mbib-aggregated/racial-bias.csv - split: text_level_bias path: mbib-aggregated/text-level-bias.csv --- # Dataset Card for Media-Bias-Identification-Benchmark ## Table of Contents - [Dataset Card for Media-Bias-Identification-Benchmark](#dataset-card-for-mbib) - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Tasks and Information](#tasks-and-information) - [Baseline](#baseline) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [cognitive-bias](#cognitive-bias) - [Data Fields](#data-fields) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://github.com/Media-Bias-Group/Media-Bias-Identification-Benchmark - **Repository:** https://github.com/Media-Bias-Group/Media-Bias-Identification-Benchmark - **Paper:** https://doi.org/10.1145/3539618.3591882 - **Point of Contact:** [Martin Wessel](mailto:martin.wessel@uni-konstanz.de) ### Baseline <table> <tr><td><b>Task</b></td><td><b>Model</b></td><td><b>Micro F1</b></td><td><b>Macro F1</b></td></tr> <td>cognitive-bias</td> <td> ConvBERT/ConvBERT</td> <td>0.7126</td> <td> 0.7664</td></tr> <td>fake-news</td> <td>Bart/RoBERTa-T</td> <td>0.6811</td> <td> 0.7533</td> </tr> <td>gender-bias</td> <td> RoBERTa-T/ELECTRA</td> <td>0.8334</td> <td>0.8211</td> </tr> <td>hate-speech</td> <td>RoBERTA-T/Bart</td> <td>0.8897</td> <td> 0.7310</td> </tr> <td>linguistic-bias</td> <td> ConvBERT/Bart </td> <td> 0.7044 </td> <td> 0.4995 </td> </tr> <td>political-bias</td> <td> ConvBERT/ConvBERT </td> <td> 0.7041 </td> <td> 0.7110 </td> </tr> <td>racial-bias</td> <td> ConvBERT/ELECTRA </td> <td> 0.8772 </td> <td> 0.6170 </td> </tr> <td>text-leve-bias</td> <td> ConvBERT/ConvBERT </td> <td> 0.7697</td> <td> 0.7532 </td> </tr> </table> ### Languages All datasets are in English ## Dataset Structure ### Data Instances #### cognitive-bias An example of one training instance looks as follows. ```json { "text": "A defense bill includes language that would require military hospitals to provide abortions on demand", "label": 1 } ``` ### Data Fields - `text`: a sentence from various sources (eg., news articles, twitter, other social media). - `label`: binary indicator of bias (0 = unbiased, 1 = biased) ## Considerations for Using the Data ### Social Impact of Dataset We believe that MBIB offers a new common ground for research in the domain, especially given the rising amount of (research) attention directed toward media bias ### Citation Information ``` @inproceedings{ title = {Introducing MBIB - the first Media Bias Identification Benchmark Task and Dataset Collection}, author = {Wessel, Martin and Spinde, Timo and Horych, Tomáš and Ruas, Terry and Aizawa, Akiko and Gipp, Bela}, year = {2023}, note = {[in review]} } ```
提供机构:
mediabiasgroup
原始信息汇总

数据集概述

数据集名称

Media-Bias-Identification-Benchmark (MBIB)

数据集许可证

cc-by-nc-nd-4.0

任务类别

  • text-classification

语言

  • en

标签

  • media
  • mediabias
  • media-bias
  • media bias

大小分类

  • 1M<n<10M

数据集配置

  • config_name: plain_text

数据集分割

  • cognitive_bias
  • fake_news
  • gender_bias
  • hate_speech
  • linguistic_bias
  • political_bias
  • racial_bias
  • text_level_bias

数据集结构

  • 数据实例

    • cognitive-bias json { "text": "A defense bill includes language that would require military hospitals to provide abortions on demand", "label": 1 }
  • 数据字段

    • text: 来自各种来源的句子(例如,新闻文章、Twitter、其他社交媒体)。
    • label: 偏见的二元指示符(0 = 无偏见,1 = 有偏见)

基准模型性能

  • cognitive-bias
    • Model: ConvBERT/ConvBERT
    • Micro F1: 0.7126
    • Macro F1: 0.7664
  • fake-news
    • Model: Bart/RoBERTa-T
    • Micro F1: 0.6811
    • Macro F1: 0.7533
  • gender-bias
    • Model: RoBERTa-T/ELECTRA
    • Micro F1: 0.8334
    • Macro F1: 0.8211
  • hate-speech
    • Model: RoBERTA-T/Bart
    • Micro F1: 0.8897
    • Macro F1: 0.7310
  • linguistic-bias
    • Model: ConvBERT/Bart
    • Micro F1: 0.7044
    • Macro F1: 0.4995
  • political-bias
    • Model: ConvBERT/ConvBERT
    • Micro F1: 0.7041
    • Macro F1: 0.7110
  • racial-bias
    • Model: ConvBERT/ELECTRA
    • Micro F1: 0.8772
    • Macro F1: 0.6170
  • text-level-bias
    • Model: ConvBERT/ConvBERT
    • Micro F1: 0.7697
    • Macro F1: 0.7532
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作