five

okaris/ucberkeley-dlab-measuring-hate-speech

收藏
Hugging Face2023-09-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/okaris/ucberkeley-dlab-measuring-hate-speech
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: comment_id dtype: int32 - name: annotator_id dtype: int32 - name: platform dtype: int8 - name: sentiment dtype: float64 - name: respect dtype: float64 - name: insult dtype: float64 - name: humiliate dtype: float64 - name: status dtype: float64 - name: dehumanize dtype: float64 - name: violence dtype: float64 - name: genocide dtype: float64 - name: attack_defend dtype: float64 - name: hatespeech dtype: float64 - name: hate_speech_score dtype: float64 - name: text dtype: string - name: infitms dtype: float64 - name: outfitms dtype: float64 - name: annotator_severity dtype: float64 - name: std_err dtype: float64 - name: annotator_infitms dtype: float64 - name: annotator_outfitms dtype: float64 - name: hypothesis dtype: float64 - name: target_race_asian dtype: bool - name: target_race_black dtype: bool - name: target_race_latinx dtype: bool - name: target_race_middle_eastern dtype: bool - name: target_race_native_american dtype: bool - name: target_race_pacific_islander dtype: bool - name: target_race_white dtype: bool - name: target_race_other dtype: bool - name: target_race dtype: bool - name: target_religion_atheist dtype: bool - name: target_religion_buddhist dtype: bool - name: target_religion_christian dtype: bool - name: target_religion_hindu dtype: bool - name: target_religion_jewish dtype: bool - name: target_religion_mormon dtype: bool - name: target_religion_muslim dtype: bool - name: target_religion_other dtype: bool - name: target_religion dtype: bool - name: target_origin_immigrant dtype: bool - name: target_origin_migrant_worker dtype: bool - name: target_origin_specific_country dtype: bool - name: target_origin_undocumented dtype: bool - name: target_origin_other dtype: bool - name: target_origin dtype: bool - name: target_gender_men dtype: bool - name: target_gender_non_binary dtype: bool - name: target_gender_transgender_men dtype: bool - name: target_gender_transgender_unspecified dtype: bool - name: target_gender_transgender_women dtype: bool - name: target_gender_women dtype: bool - name: target_gender_other dtype: bool - name: target_gender dtype: bool - name: target_sexuality_bisexual dtype: bool - name: target_sexuality_gay dtype: bool - name: target_sexuality_lesbian dtype: bool - name: target_sexuality_straight dtype: bool - name: target_sexuality_other dtype: bool - name: target_sexuality dtype: bool - name: target_age_children dtype: bool - name: target_age_teenagers dtype: bool - name: target_age_young_adults dtype: bool - name: target_age_middle_aged dtype: bool - name: target_age_seniors dtype: bool - name: target_age_other dtype: bool - name: target_age dtype: bool - name: target_disability_physical dtype: bool - name: target_disability_cognitive dtype: bool - name: target_disability_neurological dtype: bool - name: target_disability_visually_impaired dtype: bool - name: target_disability_hearing_impaired dtype: bool - name: target_disability_unspecific dtype: bool - name: target_disability_other dtype: bool - name: target_disability dtype: bool - name: annotator_gender dtype: string - name: annotator_trans dtype: string - name: annotator_educ dtype: string - name: annotator_income dtype: string - name: annotator_ideology dtype: string - name: annotator_gender_men dtype: bool - name: annotator_gender_women dtype: bool - name: annotator_gender_non_binary dtype: bool - name: annotator_gender_prefer_not_to_say dtype: bool - name: annotator_gender_self_describe dtype: bool - name: annotator_transgender dtype: bool - name: annotator_cisgender dtype: bool - name: annotator_transgender_prefer_not_to_say dtype: bool - name: annotator_education_some_high_school dtype: bool - name: annotator_education_high_school_grad dtype: bool - name: annotator_education_some_college dtype: bool - name: annotator_education_college_grad_aa dtype: bool - name: annotator_education_college_grad_ba dtype: bool - name: annotator_education_professional_degree dtype: bool - name: annotator_education_masters dtype: bool - name: annotator_education_phd dtype: bool - name: annotator_income_<10k dtype: bool - name: annotator_income_10k-50k dtype: bool - name: annotator_income_50k-100k dtype: bool - name: annotator_income_100k-200k dtype: bool - name: annotator_income_>200k dtype: bool - name: annotator_ideology_extremeley_conservative dtype: bool - name: annotator_ideology_conservative dtype: bool - name: annotator_ideology_slightly_conservative dtype: bool - name: annotator_ideology_neutral dtype: bool - name: annotator_ideology_slightly_liberal dtype: bool - name: annotator_ideology_liberal dtype: bool - name: annotator_ideology_extremeley_liberal dtype: bool - name: annotator_ideology_no_opinion dtype: bool - name: annotator_race_asian dtype: bool - name: annotator_race_black dtype: bool - name: annotator_race_latinx dtype: bool - name: annotator_race_middle_eastern dtype: bool - name: annotator_race_native_american dtype: bool - name: annotator_race_pacific_islander dtype: bool - name: annotator_race_white dtype: bool - name: annotator_race_other dtype: bool - name: annotator_age dtype: float64 - name: annotator_religion_atheist dtype: bool - name: annotator_religion_buddhist dtype: bool - name: annotator_religion_christian dtype: bool - name: annotator_religion_hindu dtype: bool - name: annotator_religion_jewish dtype: bool - name: annotator_religion_mormon dtype: bool - name: annotator_religion_muslim dtype: bool - name: annotator_religion_nothing dtype: bool - name: annotator_religion_other dtype: bool - name: annotator_sexuality_bisexual dtype: bool - name: annotator_sexuality_gay dtype: bool - name: annotator_sexuality_straight dtype: bool - name: annotator_sexuality_other dtype: bool splits: - name: train num_bytes: 52943809 num_examples: 135556 download_size: 19680581 dataset_size: 52943809 --- # Dataset Card for "ucberkeley-dlab-measuring-hate-speech" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
okaris
原始信息汇总

数据集概述

数据集配置

  • 配置名称: default
  • 数据文件路径:
    • 训练集: data/train-*

数据集信息

特征列表

  • comment_id: 评论ID,数据类型为 int32
  • annotator_id: 标注者ID,数据类型为 int32
  • platform: 平台,数据类型为 int8
  • sentiment: 情感,数据类型为 float64
  • respect: 尊重,数据类型为 float64
  • insult: 侮辱,数据类型为 float64
  • humiliate: 羞辱,数据类型为 float64
  • status: 状态,数据类型为 float64
  • dehumanize: 非人化,数据类型为 float64
  • violence: 暴力,数据类型为 float64
  • genocide: 种族灭绝,数据类型为 float64
  • attack_defend: 攻击/防御,数据类型为 float64
  • hatespeech: 仇恨言论,数据类型为 float64
  • hate_speech_score: 仇恨言论得分,数据类型为 float64
  • text: 文本,数据类型为 string
  • infitms: infitms,数据类型为 float64
  • outfitms: outfitms,数据类型为 float64
  • annotator_severity: 标注者严重性,数据类型为 float64
  • std_err: 标准误差,数据类型为 float64
  • annotator_infitms: 标注者infitms,数据类型为 float64
  • annotator_outfitms: 标注者outfitms,数据类型为 float64
  • hypothesis: 假设,数据类型为 float64
  • target_race_asian: 目标种族为亚洲人,数据类型为 bool
  • target_race_black: 目标种族为黑人,数据类型为 bool
  • target_race_latinx: 目标种族为拉丁裔,数据类型为 bool
  • target_race_middle_eastern: 目标种族为中东人,数据类型为 bool
  • target_race_native_american: 目标种族为美洲原住民,数据类型为 bool
  • target_race_pacific_islander: 目标种族为太平洋岛民,数据类型为 bool
  • target_race_white: 目标种族为白人,数据类型为 bool
  • target_race_other: 目标种族为其他,数据类型为 bool
  • target_race: 目标种族,数据类型为 bool
  • target_religion_atheist: 目标宗教为无神论者,数据类型为 bool
  • target_religion_buddhist: 目标宗教为佛教徒,数据类型为 bool
  • target_religion_christian: 目标宗教为基督徒,数据类型为 bool
  • target_religion_hindu: 目标宗教为印度教徒,数据类型为 bool
  • target_religion_jewish: 目标宗教为犹太人,数据类型为 bool
  • target_religion_mormon: 目标宗教为摩门教徒,数据类型为 bool
  • target_religion_muslim: 目标宗教为穆斯林,数据类型为 bool
  • target_religion_other: 目标宗教为其他,数据类型为 bool
  • target_religion: 目标宗教,数据类型为 bool
  • target_origin_immigrant: 目标出身为移民,数据类型为 bool
  • target_origin_migrant_worker: 目标出身为移工,数据类型为 bool
  • target_origin_specific_country: 目标出身为特定国家,数据类型为 bool
  • target_origin_undocumented: 目标出身为无证移民,数据类型为 bool
  • target_origin_other: 目标出身为其他,数据类型为 bool
  • target_origin: 目标出身,数据类型为 bool
  • target_gender_men: 目标性别为男性,数据类型为 bool
  • target_gender_non_binary: 目标性别为非二元性别,数据类型为 bool
  • target_gender_transgender_men: 目标性别为跨性别男性,数据类型为 bool
  • target_gender_transgender_unspecified: 目标性别为未指明的跨性别者,数据类型为 bool
  • target_gender_transgender_women: 目标性别为跨性别女性,数据类型为 bool
  • target_gender_women: 目标性别为女性,数据类型为 bool
  • target_gender_other: 目标性别为其他,数据类型为 bool
  • target_gender: 目标性别,数据类型为 bool
  • target_sexuality_bisexual: 目标性取向为双性恋,数据类型为 bool
  • target_sexuality_gay: 目标性取向为同性恋,数据类型为 bool
  • target_sexuality_lesbian: 目标性取向为女同性恋,数据类型为 bool
  • target_sexuality_straight: 目标性取向为异性恋,数据类型为 bool
  • target_sexuality_other: 目标性取向为其他,数据类型为 bool
  • target_sexuality: 目标性取向,数据类型为 bool
  • target_age_children: 目标年龄为儿童,数据类型为 bool
  • target_age_teenagers: 目标年龄为青少年,数据类型为 bool
  • target_age_young_adults: 目标年龄为年轻成人,数据类型为 bool
  • target_age_middle_aged: 目标年龄为中年,数据类型为 bool
  • target_age_seniors: 目标年龄为老年人,数据类型为 bool
  • target_age_other: 目标年龄为其他,数据类型为 bool
  • target_age: 目标年龄,数据类型为 bool
  • target_disability_physical: 目标残疾为身体残疾,数据类型为 bool
  • target_disability_cognitive: 目标残疾为认知残疾,数据类型为 bool
  • target_disability_neurological: 目标残疾为神经残疾,数据类型为 bool
  • target_disability_visually_impaired: 目标残疾为视力受损,数据类型为 bool
  • target_disability_hearing_impaired: 目标残疾为听力受损,数据类型为 bool
  • target_disability_unspecific: 目标残疾为未指明,数据类型为 bool
  • target_disability_other: 目标残疾为其他,数据类型为 bool
  • target_disability: 目标残疾,数据类型为 bool
  • annotator_gender: 标注者性别,数据类型为 string
  • annotator_trans: 标注者跨性别,数据类型为 string
  • annotator_educ: 标注者教育,数据类型为 string
  • annotator_income: 标注者收入,数据类型为 string
  • annotator_ideology: 标注者意识形态,数据类型为 string
  • annotator_gender_men: 标注者性别为男性,数据类型为 bool
  • annotator_gender_women: 标注者性别为女性,数据类型为 bool
  • annotator_gender_non_binary: 标注者性别为非二元性别,数据类型为 bool
  • annotator_gender_prefer_not_to_say: 标注者性别为不愿透露,数据类型为 bool
  • annotator_gender_self_describe: 标注者性别为自我描述,数据类型为 bool
  • annotator_transgender: 标注者为跨性别者,数据类型为 bool
  • annotator_cisgender: 标注者为顺性别者,数据类型为 bool
  • annotator_transgender_prefer_not_to_say: 标注者为跨性别者且不愿透露,数据类型为 bool
  • annotator_education_some_high_school: 标注者教育为高中未毕业,数据类型为 bool
  • annotator_education_high_school_grad: 标注者教育为高中毕业,数据类型为 bool
  • annotator_education_some_college: 标注者教育为大学未毕业,数据类型为 bool
  • annotator_education_college_grad_aa: 标注者教育为大学毕业(副学士),数据类型为 bool
  • annotator_education_college_grad_ba: 标注者教育为大学毕业(学士),数据类型为 bool
  • annotator_education_professional_degree: 标注者教育为专业学位,数据类型为 bool
  • annotator_education_masters: 标注者教育为硕士,数据类型为 bool
  • annotator_education_phd: 标注者教育为博士,数据类型为 bool
  • annotator_income_<10k: 标注者收入低于10k,数据类型为 bool
  • annotator_income_10k-50k: 标注者收入10k-50k,数据类型为 bool
  • annotator_income_50k-100k: 标注者收入50k-100k,数据类型为 bool
  • annotator_income_100k-200k: 标注者收入100k-200k,数据类型为 bool
  • annotator_income_>200k: 标注者收入高于200k,数据类型为 bool
  • annotator_ideology_extremeley_conservative: 标注者意识形态为极端保守,数据类型为 bool
  • annotator_ideology_conservative: 标注者意识形态为保守,数据类型为 bool
  • annotator_ideology_slightly_conservative: 标注者意识形态为轻微保守,数据类型为 bool
  • annotator_ideology_neutral: 标注者意识形态为中立,数据类型为 bool
  • annotator_ideology_slightly_liberal: 标注者意识形态为轻微自由,数据类型为 bool
  • annotator_ideology_liberal: 标注者意识形态为自由,数据类型为 bool
  • annotator_ideology_extremeley_liberal: 标注者意识形态为极端自由,数据类型为 bool
  • annotator_ideology_no_opinion: 标注者意识形态为无意见,数据类型为 bool
  • annotator_race_asian: 标注者种族为亚洲人,数据类型为 bool
  • annotator_race_black: 标注者种族为黑人,数据类型为 bool
  • annotator_race_latinx: 标注者种族为拉丁裔,数据类型为 bool
  • annotator_race_middle_eastern: 标注者种族为中东人,数据类型为 bool
  • annotator_race_native_american: 标注者种族为美洲原住民,数据类型为 bool
  • annotator_race_pacific_islander: 标注者种族为太平洋岛民,数据类型为 bool
  • annotator_race_white: 标注者种族为白人,数据类型为 bool
  • annotator_race_other: 标注者种族为其他,数据类型为 bool
  • annotator_age: 标注者年龄,数据类型为 float64
  • annotator_religion_atheist: 标注者宗教为无神论者,数据类型为 bool
  • annotator_religion_buddhist: 标注者宗教为佛教徒,数据类型为 bool
  • annotator_religion_christian: 标注者宗教为基督徒,数据类型为 bool
  • annotator_religion_hindu: 标注者宗教为印度教徒,数据类型为 bool
  • annotator_religion_jewish: 标注者宗教为犹太人,数据类型为 bool
  • annotator_religion_mormon: 标注者宗教为摩门教徒,数据类型为 bool
  • annotator_religion_muslim: 标注者宗教为穆斯林,数据类型为 bool
  • annotator_religion_nothing: 标注者宗教为无,数据类型为 bool
  • annotator_religion_other: 标注者宗教为其他,数据类型为 bool
  • annotator_sexuality_bisexual: 标注者性取向为双性恋,数据类型为 bool
  • annotator_sexuality_gay: 标注者性取向为同性恋,数据类型为 bool
  • annotator_sexuality_straight: 标注者性取向为异性恋,数据类型为 bool
  • annotator_sexuality_other: 标注者性取向为其他,数据类型为 bool

数据集分割

  • 训练集:
    • 文件大小: 52943809 字节
    • 样本数量: 135556 个样本

数据集大小

  • 下载大小: 19680581 字节
  • 数据集大小: 52943809 字节
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作