jtatman/jigsaw_hatebert
收藏Hugging Face2023-09-06 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/jtatman/jigsaw_hatebert
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: text_masked
dtype: string
- name: text_replaced
list:
- name: score
dtype: float64
- name: sequence
dtype: string
- name: token
dtype: int64
- name: token_str
dtype: string
- name: asian
dtype: string
- name: atheist
dtype: string
- name: bisexual
dtype: string
- name: black
dtype: string
- name: buddhist
dtype: string
- name: christian
dtype: string
- name: female
dtype: string
- name: heterosexual
dtype: string
- name: hindu
dtype: string
- name: homosexual_gay_or_lesbian
dtype: string
- name: intellectual_or_learning_disability
dtype: string
- name: jewish
dtype: string
- name: latino
dtype: string
- name: male
dtype: string
- name: muslim
dtype: string
- name: other_disability
dtype: string
- name: other_gender
dtype: string
- name: other_race_or_ethnicity
dtype: string
- name: other_religion
dtype: string
- name: other_sexual_orientation
dtype: string
- name: physical_disability
dtype: string
- name: psychiatric_or_mental_illness
dtype: string
- name: transgender
dtype: string
- name: white
dtype: string
- name: funny
dtype: string
- name: wow
dtype: string
- name: sad
dtype: string
- name: likes
dtype: string
- name: disagree
dtype: string
- name: target
dtype: int64
- name: __index_level_0__
dtype: int64
splits:
- name: train
num_bytes: 236287827
num_examples: 110000
download_size: 83975623
dataset_size: 236287827
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for "jigsaw_hatebert"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
jtatman
原始信息汇总
数据集概述
数据集信息
特征
- text: 类型为字符串
- text_masked: 类型为字符串
- text_replaced: 包含以下子特征
- score: 类型为浮点数 (float64)
- sequence: 类型为字符串
- token: 类型为整数 (int64)
- token_str: 类型为字符串
- asian: 类型为字符串
- atheist: 类型为字符串
- bisexual: 类型为字符串
- black: 类型为字符串
- buddhist: 类型为字符串
- christian: 类型为字符串
- female: 类型为字符串
- heterosexual: 类型为字符串
- hindu: 类型为字符串
- homosexual_gay_or_lesbian: 类型为字符串
- intellectual_or_learning_disability: 类型为字符串
- jewish: 类型为字符串
- latino: 类型为字符串
- male: 类型为字符串
- muslim: 类型为字符串
- other_disability: 类型为字符串
- other_gender: 类型为字符串
- other_race_or_ethnicity: 类型为字符串
- other_religion: 类型为字符串
- other_sexual_orientation: 类型为字符串
- physical_disability: 类型为字符串
- psychiatric_or_mental_illness: 类型为字符串
- transgender: 类型为字符串
- white: 类型为字符串
- funny: 类型为字符串
- wow: 类型为字符串
- sad: 类型为字符串
- likes: 类型为字符串
- disagree: 类型为字符串
- target: 类型为整数 (int64)
- index_level_0: 类型为整数 (int64)
数据分割
- train: 包含 110000 个样本,占用 236287827 字节
数据集大小
- 下载大小: 83975623 字节
- 数据集大小: 236287827 字节
配置
- default: 包含训练数据文件,路径为
data/train-*



