okaris/ucberkeley-dlab-measuring-hate-speech
收藏Hugging Face2023-09-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/okaris/ucberkeley-dlab-measuring-hate-speech
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: comment_id
dtype: int32
- name: annotator_id
dtype: int32
- name: platform
dtype: int8
- name: sentiment
dtype: float64
- name: respect
dtype: float64
- name: insult
dtype: float64
- name: humiliate
dtype: float64
- name: status
dtype: float64
- name: dehumanize
dtype: float64
- name: violence
dtype: float64
- name: genocide
dtype: float64
- name: attack_defend
dtype: float64
- name: hatespeech
dtype: float64
- name: hate_speech_score
dtype: float64
- name: text
dtype: string
- name: infitms
dtype: float64
- name: outfitms
dtype: float64
- name: annotator_severity
dtype: float64
- name: std_err
dtype: float64
- name: annotator_infitms
dtype: float64
- name: annotator_outfitms
dtype: float64
- name: hypothesis
dtype: float64
- name: target_race_asian
dtype: bool
- name: target_race_black
dtype: bool
- name: target_race_latinx
dtype: bool
- name: target_race_middle_eastern
dtype: bool
- name: target_race_native_american
dtype: bool
- name: target_race_pacific_islander
dtype: bool
- name: target_race_white
dtype: bool
- name: target_race_other
dtype: bool
- name: target_race
dtype: bool
- name: target_religion_atheist
dtype: bool
- name: target_religion_buddhist
dtype: bool
- name: target_religion_christian
dtype: bool
- name: target_religion_hindu
dtype: bool
- name: target_religion_jewish
dtype: bool
- name: target_religion_mormon
dtype: bool
- name: target_religion_muslim
dtype: bool
- name: target_religion_other
dtype: bool
- name: target_religion
dtype: bool
- name: target_origin_immigrant
dtype: bool
- name: target_origin_migrant_worker
dtype: bool
- name: target_origin_specific_country
dtype: bool
- name: target_origin_undocumented
dtype: bool
- name: target_origin_other
dtype: bool
- name: target_origin
dtype: bool
- name: target_gender_men
dtype: bool
- name: target_gender_non_binary
dtype: bool
- name: target_gender_transgender_men
dtype: bool
- name: target_gender_transgender_unspecified
dtype: bool
- name: target_gender_transgender_women
dtype: bool
- name: target_gender_women
dtype: bool
- name: target_gender_other
dtype: bool
- name: target_gender
dtype: bool
- name: target_sexuality_bisexual
dtype: bool
- name: target_sexuality_gay
dtype: bool
- name: target_sexuality_lesbian
dtype: bool
- name: target_sexuality_straight
dtype: bool
- name: target_sexuality_other
dtype: bool
- name: target_sexuality
dtype: bool
- name: target_age_children
dtype: bool
- name: target_age_teenagers
dtype: bool
- name: target_age_young_adults
dtype: bool
- name: target_age_middle_aged
dtype: bool
- name: target_age_seniors
dtype: bool
- name: target_age_other
dtype: bool
- name: target_age
dtype: bool
- name: target_disability_physical
dtype: bool
- name: target_disability_cognitive
dtype: bool
- name: target_disability_neurological
dtype: bool
- name: target_disability_visually_impaired
dtype: bool
- name: target_disability_hearing_impaired
dtype: bool
- name: target_disability_unspecific
dtype: bool
- name: target_disability_other
dtype: bool
- name: target_disability
dtype: bool
- name: annotator_gender
dtype: string
- name: annotator_trans
dtype: string
- name: annotator_educ
dtype: string
- name: annotator_income
dtype: string
- name: annotator_ideology
dtype: string
- name: annotator_gender_men
dtype: bool
- name: annotator_gender_women
dtype: bool
- name: annotator_gender_non_binary
dtype: bool
- name: annotator_gender_prefer_not_to_say
dtype: bool
- name: annotator_gender_self_describe
dtype: bool
- name: annotator_transgender
dtype: bool
- name: annotator_cisgender
dtype: bool
- name: annotator_transgender_prefer_not_to_say
dtype: bool
- name: annotator_education_some_high_school
dtype: bool
- name: annotator_education_high_school_grad
dtype: bool
- name: annotator_education_some_college
dtype: bool
- name: annotator_education_college_grad_aa
dtype: bool
- name: annotator_education_college_grad_ba
dtype: bool
- name: annotator_education_professional_degree
dtype: bool
- name: annotator_education_masters
dtype: bool
- name: annotator_education_phd
dtype: bool
- name: annotator_income_<10k
dtype: bool
- name: annotator_income_10k-50k
dtype: bool
- name: annotator_income_50k-100k
dtype: bool
- name: annotator_income_100k-200k
dtype: bool
- name: annotator_income_>200k
dtype: bool
- name: annotator_ideology_extremeley_conservative
dtype: bool
- name: annotator_ideology_conservative
dtype: bool
- name: annotator_ideology_slightly_conservative
dtype: bool
- name: annotator_ideology_neutral
dtype: bool
- name: annotator_ideology_slightly_liberal
dtype: bool
- name: annotator_ideology_liberal
dtype: bool
- name: annotator_ideology_extremeley_liberal
dtype: bool
- name: annotator_ideology_no_opinion
dtype: bool
- name: annotator_race_asian
dtype: bool
- name: annotator_race_black
dtype: bool
- name: annotator_race_latinx
dtype: bool
- name: annotator_race_middle_eastern
dtype: bool
- name: annotator_race_native_american
dtype: bool
- name: annotator_race_pacific_islander
dtype: bool
- name: annotator_race_white
dtype: bool
- name: annotator_race_other
dtype: bool
- name: annotator_age
dtype: float64
- name: annotator_religion_atheist
dtype: bool
- name: annotator_religion_buddhist
dtype: bool
- name: annotator_religion_christian
dtype: bool
- name: annotator_religion_hindu
dtype: bool
- name: annotator_religion_jewish
dtype: bool
- name: annotator_religion_mormon
dtype: bool
- name: annotator_religion_muslim
dtype: bool
- name: annotator_religion_nothing
dtype: bool
- name: annotator_religion_other
dtype: bool
- name: annotator_sexuality_bisexual
dtype: bool
- name: annotator_sexuality_gay
dtype: bool
- name: annotator_sexuality_straight
dtype: bool
- name: annotator_sexuality_other
dtype: bool
splits:
- name: train
num_bytes: 52943809
num_examples: 135556
download_size: 19680581
dataset_size: 52943809
---
# Dataset Card for "ucberkeley-dlab-measuring-hate-speech"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
okaris
原始信息汇总
数据集概述
数据集配置
- 配置名称: default
- 数据文件路径:
- 训练集:
data/train-*
- 训练集:
数据集信息
特征列表
- comment_id: 评论ID,数据类型为
int32 - annotator_id: 标注者ID,数据类型为
int32 - platform: 平台,数据类型为
int8 - sentiment: 情感,数据类型为
float64 - respect: 尊重,数据类型为
float64 - insult: 侮辱,数据类型为
float64 - humiliate: 羞辱,数据类型为
float64 - status: 状态,数据类型为
float64 - dehumanize: 非人化,数据类型为
float64 - violence: 暴力,数据类型为
float64 - genocide: 种族灭绝,数据类型为
float64 - attack_defend: 攻击/防御,数据类型为
float64 - hatespeech: 仇恨言论,数据类型为
float64 - hate_speech_score: 仇恨言论得分,数据类型为
float64 - text: 文本,数据类型为
string - infitms: infitms,数据类型为
float64 - outfitms: outfitms,数据类型为
float64 - annotator_severity: 标注者严重性,数据类型为
float64 - std_err: 标准误差,数据类型为
float64 - annotator_infitms: 标注者infitms,数据类型为
float64 - annotator_outfitms: 标注者outfitms,数据类型为
float64 - hypothesis: 假设,数据类型为
float64 - target_race_asian: 目标种族为亚洲人,数据类型为
bool - target_race_black: 目标种族为黑人,数据类型为
bool - target_race_latinx: 目标种族为拉丁裔,数据类型为
bool - target_race_middle_eastern: 目标种族为中东人,数据类型为
bool - target_race_native_american: 目标种族为美洲原住民,数据类型为
bool - target_race_pacific_islander: 目标种族为太平洋岛民,数据类型为
bool - target_race_white: 目标种族为白人,数据类型为
bool - target_race_other: 目标种族为其他,数据类型为
bool - target_race: 目标种族,数据类型为
bool - target_religion_atheist: 目标宗教为无神论者,数据类型为
bool - target_religion_buddhist: 目标宗教为佛教徒,数据类型为
bool - target_religion_christian: 目标宗教为基督徒,数据类型为
bool - target_religion_hindu: 目标宗教为印度教徒,数据类型为
bool - target_religion_jewish: 目标宗教为犹太人,数据类型为
bool - target_religion_mormon: 目标宗教为摩门教徒,数据类型为
bool - target_religion_muslim: 目标宗教为穆斯林,数据类型为
bool - target_religion_other: 目标宗教为其他,数据类型为
bool - target_religion: 目标宗教,数据类型为
bool - target_origin_immigrant: 目标出身为移民,数据类型为
bool - target_origin_migrant_worker: 目标出身为移工,数据类型为
bool - target_origin_specific_country: 目标出身为特定国家,数据类型为
bool - target_origin_undocumented: 目标出身为无证移民,数据类型为
bool - target_origin_other: 目标出身为其他,数据类型为
bool - target_origin: 目标出身,数据类型为
bool - target_gender_men: 目标性别为男性,数据类型为
bool - target_gender_non_binary: 目标性别为非二元性别,数据类型为
bool - target_gender_transgender_men: 目标性别为跨性别男性,数据类型为
bool - target_gender_transgender_unspecified: 目标性别为未指明的跨性别者,数据类型为
bool - target_gender_transgender_women: 目标性别为跨性别女性,数据类型为
bool - target_gender_women: 目标性别为女性,数据类型为
bool - target_gender_other: 目标性别为其他,数据类型为
bool - target_gender: 目标性别,数据类型为
bool - target_sexuality_bisexual: 目标性取向为双性恋,数据类型为
bool - target_sexuality_gay: 目标性取向为同性恋,数据类型为
bool - target_sexuality_lesbian: 目标性取向为女同性恋,数据类型为
bool - target_sexuality_straight: 目标性取向为异性恋,数据类型为
bool - target_sexuality_other: 目标性取向为其他,数据类型为
bool - target_sexuality: 目标性取向,数据类型为
bool - target_age_children: 目标年龄为儿童,数据类型为
bool - target_age_teenagers: 目标年龄为青少年,数据类型为
bool - target_age_young_adults: 目标年龄为年轻成人,数据类型为
bool - target_age_middle_aged: 目标年龄为中年,数据类型为
bool - target_age_seniors: 目标年龄为老年人,数据类型为
bool - target_age_other: 目标年龄为其他,数据类型为
bool - target_age: 目标年龄,数据类型为
bool - target_disability_physical: 目标残疾为身体残疾,数据类型为
bool - target_disability_cognitive: 目标残疾为认知残疾,数据类型为
bool - target_disability_neurological: 目标残疾为神经残疾,数据类型为
bool - target_disability_visually_impaired: 目标残疾为视力受损,数据类型为
bool - target_disability_hearing_impaired: 目标残疾为听力受损,数据类型为
bool - target_disability_unspecific: 目标残疾为未指明,数据类型为
bool - target_disability_other: 目标残疾为其他,数据类型为
bool - target_disability: 目标残疾,数据类型为
bool - annotator_gender: 标注者性别,数据类型为
string - annotator_trans: 标注者跨性别,数据类型为
string - annotator_educ: 标注者教育,数据类型为
string - annotator_income: 标注者收入,数据类型为
string - annotator_ideology: 标注者意识形态,数据类型为
string - annotator_gender_men: 标注者性别为男性,数据类型为
bool - annotator_gender_women: 标注者性别为女性,数据类型为
bool - annotator_gender_non_binary: 标注者性别为非二元性别,数据类型为
bool - annotator_gender_prefer_not_to_say: 标注者性别为不愿透露,数据类型为
bool - annotator_gender_self_describe: 标注者性别为自我描述,数据类型为
bool - annotator_transgender: 标注者为跨性别者,数据类型为
bool - annotator_cisgender: 标注者为顺性别者,数据类型为
bool - annotator_transgender_prefer_not_to_say: 标注者为跨性别者且不愿透露,数据类型为
bool - annotator_education_some_high_school: 标注者教育为高中未毕业,数据类型为
bool - annotator_education_high_school_grad: 标注者教育为高中毕业,数据类型为
bool - annotator_education_some_college: 标注者教育为大学未毕业,数据类型为
bool - annotator_education_college_grad_aa: 标注者教育为大学毕业(副学士),数据类型为
bool - annotator_education_college_grad_ba: 标注者教育为大学毕业(学士),数据类型为
bool - annotator_education_professional_degree: 标注者教育为专业学位,数据类型为
bool - annotator_education_masters: 标注者教育为硕士,数据类型为
bool - annotator_education_phd: 标注者教育为博士,数据类型为
bool - annotator_income_<10k: 标注者收入低于10k,数据类型为
bool - annotator_income_10k-50k: 标注者收入10k-50k,数据类型为
bool - annotator_income_50k-100k: 标注者收入50k-100k,数据类型为
bool - annotator_income_100k-200k: 标注者收入100k-200k,数据类型为
bool - annotator_income_>200k: 标注者收入高于200k,数据类型为
bool - annotator_ideology_extremeley_conservative: 标注者意识形态为极端保守,数据类型为
bool - annotator_ideology_conservative: 标注者意识形态为保守,数据类型为
bool - annotator_ideology_slightly_conservative: 标注者意识形态为轻微保守,数据类型为
bool - annotator_ideology_neutral: 标注者意识形态为中立,数据类型为
bool - annotator_ideology_slightly_liberal: 标注者意识形态为轻微自由,数据类型为
bool - annotator_ideology_liberal: 标注者意识形态为自由,数据类型为
bool - annotator_ideology_extremeley_liberal: 标注者意识形态为极端自由,数据类型为
bool - annotator_ideology_no_opinion: 标注者意识形态为无意见,数据类型为
bool - annotator_race_asian: 标注者种族为亚洲人,数据类型为
bool - annotator_race_black: 标注者种族为黑人,数据类型为
bool - annotator_race_latinx: 标注者种族为拉丁裔,数据类型为
bool - annotator_race_middle_eastern: 标注者种族为中东人,数据类型为
bool - annotator_race_native_american: 标注者种族为美洲原住民,数据类型为
bool - annotator_race_pacific_islander: 标注者种族为太平洋岛民,数据类型为
bool - annotator_race_white: 标注者种族为白人,数据类型为
bool - annotator_race_other: 标注者种族为其他,数据类型为
bool - annotator_age: 标注者年龄,数据类型为
float64 - annotator_religion_atheist: 标注者宗教为无神论者,数据类型为
bool - annotator_religion_buddhist: 标注者宗教为佛教徒,数据类型为
bool - annotator_religion_christian: 标注者宗教为基督徒,数据类型为
bool - annotator_religion_hindu: 标注者宗教为印度教徒,数据类型为
bool - annotator_religion_jewish: 标注者宗教为犹太人,数据类型为
bool - annotator_religion_mormon: 标注者宗教为摩门教徒,数据类型为
bool - annotator_religion_muslim: 标注者宗教为穆斯林,数据类型为
bool - annotator_religion_nothing: 标注者宗教为无,数据类型为
bool - annotator_religion_other: 标注者宗教为其他,数据类型为
bool - annotator_sexuality_bisexual: 标注者性取向为双性恋,数据类型为
bool - annotator_sexuality_gay: 标注者性取向为同性恋,数据类型为
bool - annotator_sexuality_straight: 标注者性取向为异性恋,数据类型为
bool - annotator_sexuality_other: 标注者性取向为其他,数据类型为
bool
数据集分割
- 训练集:
- 文件大小: 52943809 字节
- 样本数量: 135556 个样本
数据集大小
- 下载大小: 19680581 字节
- 数据集大小: 52943809 字节



