jason9693/APEACH
收藏Hugging Face2022-07-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jason9693/APEACH
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
- crowd-generated
language_creators:
- found
language:
- ko
license:
- cc-by-sa-4.0
multilinguality:
- monolingual
paperswithcode_id: apeach
pretty_name: 'APEACH'
size_categories:
- 1K<n<10K
source_datasets:
- original
task_categories:
- text-classification
task_ids:
- binary-classification
---
# Dataset for project: kor_hate_eval(APEACH)

## Sample Code
<a href="https://colab.research.google.com/drive/1djd0fuoMYIaf7VCHaLQIziJi4_yBJruP#scrollTo=VPR24ysr5Q7k"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="base"/></a>
## Dataset Descritpion
Korean Hate Speech Evaluation Datasets : trained with [BEEP!](https://huggingface.co/datasets/kor_hate) and evaluate with [APEACH](https://github.com/jason9693/APEACH)
- **Repository: [Korean HateSpeech Evaluation Dataset](https://github.com/jason9693/APEACH)**
- **Paper: [APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets](https://arxiv.org/abs/2202.12459)**
- **Point of Contact: [Kichang Yang](ykcha9@gmail.com)**
### Languages
ko-KR
## Dataset Structure
### Data Instances
A sample from this dataset looks as follows:
```json
{'text': ['(현재 호텔주인 심정) 아18 난 마른하늘에 날벼락맞고 호텔망하게생겼는데 누군 계속 추모받네....',
'....한국적인 미인의 대표적인 분...너무나 곱고아름다운모습...그모습뒤의 슬픔을 미처 알지못했네요ㅠ'],
'class': ['Spoiled', 'Default']}
```
### Dataset Fields
The dataset has the following fields (also called "features"):
```json
{
"text": "Value(dtype='string', id=None)",
"class": "ClassLabel(num_classes=2, names=['Default', 'Spoiled'], id=None)"
}
```
### Dataset Splits
This dataset is split into a train and validation split. The split sizes are as follow:
| Split name | Num samples |
| ------------ | ------------------- |
| train (binarized BEEP!) | 7896 |
| valid (APEACH) | 3770 |
## Citation
```
@article{yang2022apeach,
title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
journal={arXiv preprint arXiv:2202.12459},
year={2022}
}
```
提供机构:
jason9693
原始信息汇总
数据集概述
- 名称: APEACH
- 语言: 韩语 (ko-KR)
- 许可证: cc-by-sa-4.0
- 多语言性: 单语种
- 任务类别: 文本分类
- 任务ID: 二元分类
- 数据集大小: 1K<n<10K
- 数据来源: 原始数据
- 标注创建者: 众包, 群体生成
- 语言创建者: 发现
数据集结构
数据实例
json {text: [(현재 호텔주인 심정) 아18 난 마른하늘에 날벼락맞고 호텔망하게생겼는데 누군 계속 추모받네...., ....한국적인 미인의 대표적인 분...너무나 곱고아름다운모습...그모습뒤의 슬픔을 미처 알지못했네요ㅠ], class: [Spoiled, Default]}
数据集字段
json { "text": "Value(dtype=string, id=None)", "class": "ClassLabel(num_classes=2, names=[Default, Spoiled], id=None)" }
数据集分割
| 分割名称 | 样本数量 |
|---|---|
| 训练 (binarized BEEP!) | 7896 |
| 验证 (APEACH) | 3770 |
引用信息
@article{yang2022apeach, title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets}, author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik}, journal={arXiv preprint arXiv:2202.12459}, year={2022} }



