mediabiasgroup/mbib-base
收藏Hugging Face2024-02-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/mediabiasgroup/mbib-base
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-nd-4.0
task_categories:
- text-classification
language:
- en
tags:
- media
- mediabias
- media-bias
- media bias
size_categories:
- 1M<n<10M
dataset_info:
config_name: plain_text
splits:
- name: cognitive_bias
- name: fake_news
- name: gender_bias
- name: hate_speech
- name: linguistic_bias
- name: political_bias
- name: racial_bias
- name: text_level_bias
configs:
- config_name: default
data_files:
- split: cognitive_bias
path: mbib-aggregated/cognitive-bias.csv
- split: fake_news
path: mbib-aggregated/fake-news.csv
- split: gender_bias
path: mbib-aggregated/gender-bias.csv
- split: hate_speech
path: mbib-aggregated/hate-speech.csv
- split: linguistic_bias
path: mbib-aggregated/linguistic-bias.csv
- split: political_bias
path: mbib-aggregated/political-bias.csv
- split: racial_bias
path: mbib-aggregated/racial-bias.csv
- split: text_level_bias
path: mbib-aggregated/text-level-bias.csv
---
# Dataset Card for Media-Bias-Identification-Benchmark
## Table of Contents
- [Dataset Card for Media-Bias-Identification-Benchmark](#dataset-card-for-mbib)
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Tasks and Information](#tasks-and-information)
- [Baseline](#baseline)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [cognitive-bias](#cognitive-bias)
- [Data Fields](#data-fields)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** https://github.com/Media-Bias-Group/Media-Bias-Identification-Benchmark
- **Repository:** https://github.com/Media-Bias-Group/Media-Bias-Identification-Benchmark
- **Paper:** https://doi.org/10.1145/3539618.3591882
- **Point of Contact:** [Martin Wessel](mailto:martin.wessel@uni-konstanz.de)
### Baseline
<table>
<tr><td><b>Task</b></td><td><b>Model</b></td><td><b>Micro F1</b></td><td><b>Macro F1</b></td></tr>
<td>cognitive-bias</td> <td> ConvBERT/ConvBERT</td> <td>0.7126</td> <td> 0.7664</td></tr>
<td>fake-news</td> <td>Bart/RoBERTa-T</td> <td>0.6811</td> <td> 0.7533</td> </tr>
<td>gender-bias</td> <td> RoBERTa-T/ELECTRA</td> <td>0.8334</td> <td>0.8211</td> </tr>
<td>hate-speech</td> <td>RoBERTA-T/Bart</td> <td>0.8897</td> <td> 0.7310</td> </tr>
<td>linguistic-bias</td> <td> ConvBERT/Bart </td> <td> 0.7044 </td> <td> 0.4995 </td> </tr>
<td>political-bias</td> <td> ConvBERT/ConvBERT </td> <td> 0.7041 </td> <td> 0.7110 </td> </tr>
<td>racial-bias</td> <td> ConvBERT/ELECTRA </td> <td> 0.8772 </td> <td> 0.6170 </td> </tr>
<td>text-leve-bias</td> <td> ConvBERT/ConvBERT </td> <td> 0.7697</td> <td> 0.7532 </td> </tr>
</table>
### Languages
All datasets are in English
## Dataset Structure
### Data Instances
#### cognitive-bias
An example of one training instance looks as follows.
```json
{
"text": "A defense bill includes language that would require military hospitals to provide abortions on demand",
"label": 1
}
```
### Data Fields
- `text`: a sentence from various sources (eg., news articles, twitter, other social media).
- `label`: binary indicator of bias (0 = unbiased, 1 = biased)
## Considerations for Using the Data
### Social Impact of Dataset
We believe that MBIB offers a new common ground
for research in the domain, especially given the rising amount of
(research) attention directed toward media bias
### Citation Information
```
@inproceedings{
title = {Introducing MBIB - the first Media Bias Identification Benchmark Task and Dataset Collection},
author = {Wessel, Martin and Spinde, Timo and Horych, Tomáš and Ruas, Terry and Aizawa, Akiko and Gipp, Bela},
year = {2023},
note = {[in review]}
}
```
提供机构:
mediabiasgroup
原始信息汇总
数据集概述
数据集名称
Media-Bias-Identification-Benchmark (MBIB)
数据集许可证
cc-by-nc-nd-4.0
任务类别
- text-classification
语言
- en
标签
- media
- mediabias
- media-bias
- media bias
大小分类
- 1M<n<10M
数据集配置
- config_name: plain_text
数据集分割
- cognitive_bias
- fake_news
- gender_bias
- hate_speech
- linguistic_bias
- political_bias
- racial_bias
- text_level_bias
数据集结构
-
数据实例
- cognitive-bias json { "text": "A defense bill includes language that would require military hospitals to provide abortions on demand", "label": 1 }
-
数据字段
text: 来自各种来源的句子(例如,新闻文章、Twitter、其他社交媒体)。label: 偏见的二元指示符(0 = 无偏见,1 = 有偏见)
基准模型性能
- cognitive-bias
- Model: ConvBERT/ConvBERT
- Micro F1: 0.7126
- Macro F1: 0.7664
- fake-news
- Model: Bart/RoBERTa-T
- Micro F1: 0.6811
- Macro F1: 0.7533
- gender-bias
- Model: RoBERTa-T/ELECTRA
- Micro F1: 0.8334
- Macro F1: 0.8211
- hate-speech
- Model: RoBERTA-T/Bart
- Micro F1: 0.8897
- Macro F1: 0.7310
- linguistic-bias
- Model: ConvBERT/Bart
- Micro F1: 0.7044
- Macro F1: 0.4995
- political-bias
- Model: ConvBERT/ConvBERT
- Micro F1: 0.7041
- Macro F1: 0.7110
- racial-bias
- Model: ConvBERT/ELECTRA
- Micro F1: 0.8772
- Macro F1: 0.6170
- text-level-bias
- Model: ConvBERT/ConvBERT
- Micro F1: 0.7697
- Macro F1: 0.7532



