five

jason9693/APEACH

收藏
Hugging Face2022-07-05 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jason9693/APEACH
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced - crowd-generated language_creators: - found language: - ko license: - cc-by-sa-4.0 multilinguality: - monolingual paperswithcode_id: apeach pretty_name: 'APEACH' size_categories: - 1K<n<10K source_datasets: - original task_categories: - text-classification task_ids: - binary-classification --- # Dataset for project: kor_hate_eval(APEACH) ![](https://github.com/jason9693/APEACH/raw/master/resource/dist_topics.png) ## Sample Code <a href="https://colab.research.google.com/drive/1djd0fuoMYIaf7VCHaLQIziJi4_yBJruP#scrollTo=VPR24ysr5Q7k"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="base"/></a> ## Dataset Descritpion Korean Hate Speech Evaluation Datasets : trained with [BEEP!](https://huggingface.co/datasets/kor_hate) and evaluate with [APEACH](https://github.com/jason9693/APEACH) - **Repository: [Korean HateSpeech Evaluation Dataset](https://github.com/jason9693/APEACH)** - **Paper: [APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets](https://arxiv.org/abs/2202.12459)** - **Point of Contact: [Kichang Yang](ykcha9@gmail.com)** ### Languages ko-KR ## Dataset Structure ### Data Instances A sample from this dataset looks as follows: ```json {'text': ['(현재 호텔주인 심정) 아18 난 마른하늘에 날벼락맞고 호텔망하게생겼는데 누군 계속 추모받네....', '....한국적인 미인의 대표적인 분...너무나 곱고아름다운모습...그모습뒤의 슬픔을 미처 알지못했네요ㅠ'], 'class': ['Spoiled', 'Default']} ``` ### Dataset Fields The dataset has the following fields (also called "features"): ```json { "text": "Value(dtype='string', id=None)", "class": "ClassLabel(num_classes=2, names=['Default', 'Spoiled'], id=None)" } ``` ### Dataset Splits This dataset is split into a train and validation split. The split sizes are as follow: | Split name | Num samples | | ------------ | ------------------- | | train (binarized BEEP!) | 7896 | | valid (APEACH) | 3770 | ## Citation ``` @article{yang2022apeach, title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets}, author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik}, journal={arXiv preprint arXiv:2202.12459}, year={2022} } ```
提供机构:
jason9693
原始信息汇总

数据集概述

  • 名称: APEACH
  • 语言: 韩语 (ko-KR)
  • 许可证: cc-by-sa-4.0
  • 多语言性: 单语种
  • 任务类别: 文本分类
  • 任务ID: 二元分类
  • 数据集大小: 1K<n<10K
  • 数据来源: 原始数据
  • 标注创建者: 众包, 群体生成
  • 语言创建者: 发现

数据集结构

数据实例

json {text: [(현재 호텔주인 심정) 아18 난 마른하늘에 날벼락맞고 호텔망하게생겼는데 누군 계속 추모받네...., ....한국적인 미인의 대표적인 분...너무나 곱고아름다운모습...그모습뒤의 슬픔을 미처 알지못했네요ㅠ], class: [Spoiled, Default]}

数据集字段

json { "text": "Value(dtype=string, id=None)", "class": "ClassLabel(num_classes=2, names=[Default, Spoiled], id=None)" }

数据集分割

分割名称 样本数量
训练 (binarized BEEP!) 7896
验证 (APEACH) 3770

引用信息

@article{yang2022apeach, title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets}, author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik}, journal={arXiv preprint arXiv:2202.12459}, year={2022} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作