james-burton/jigsaw_unintended_bias100K_all_text
收藏Hugging Face2023-05-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/james-burton/jigsaw_unintended_bias100K_all_text
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: comment_text
dtype: string
- name: asian
dtype: string
- name: atheist
dtype: string
- name: bisexual
dtype: string
- name: black
dtype: string
- name: buddhist
dtype: string
- name: christian
dtype: string
- name: female
dtype: string
- name: heterosexual
dtype: string
- name: hindu
dtype: string
- name: homosexual_gay_or_lesbian
dtype: string
- name: intellectual_or_learning_disability
dtype: string
- name: jewish
dtype: string
- name: latino
dtype: string
- name: male
dtype: string
- name: muslim
dtype: string
- name: other_disability
dtype: string
- name: other_gender
dtype: string
- name: other_race_or_ethnicity
dtype: string
- name: other_religion
dtype: string
- name: other_sexual_orientation
dtype: string
- name: physical_disability
dtype: string
- name: psychiatric_or_mental_illness
dtype: string
- name: transgender
dtype: string
- name: white
dtype: string
- name: funny
dtype: string
- name: wow
dtype: string
- name: sad
dtype: string
- name: likes
dtype: string
- name: disagree
dtype: string
- name: target
dtype: int64
- name: __index_level_0__
dtype: int64
splits:
- name: train
num_bytes: 43474162
num_examples: 85000
- name: validation
num_bytes: 7667244
num_examples: 15000
- name: test
num_bytes: 12792522
num_examples: 25000
download_size: 0
dataset_size: 63933928
---
# Dataset Card for "jigsaw_unintended_bias100K_all_text"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
数据集信息:
特征字段:
- 名称:评论文本(comment_text),数据类型:字符串(string)
- 名称:亚裔(asian),数据类型:字符串(string)
- 名称:无神论者(atheist),数据类型:字符串(string)
- 名称:双性恋者(bisexual),数据类型:字符串(string)
- 名称:黑人(black),数据类型:字符串(string)
- 名称:佛教徒(buddhist),数据类型:字符串(string)
- 名称:基督教徒(christian),数据类型:字符串(string)
- 名称:女性(female),数据类型:字符串(string)
- 名称:异性恋者(heterosexual),数据类型:字符串(string)
- 名称:印度教徒(hindu),数据类型:字符串(string)
- 名称:男同性恋或女同性恋者(homosexual_gay_or_lesbian),数据类型:字符串(string)
- 名称:智力或学习障碍(intellectual_or_learning_disability),数据类型:字符串(string)
- 名称:犹太教徒(jewish),数据类型:字符串(string)
- 名称:拉丁裔(latino),数据类型:字符串(string)
- 名称:男性(male),数据类型:字符串(string)
- 名称:穆斯林(muslim),数据类型:字符串(string)
- 名称:其他残疾(other_disability),数据类型:字符串(string)
- 名称:其他性别(other_gender),数据类型:字符串(string)
- 名称:其他种族或族群(other_race_or_ethnicity),数据类型:字符串(string)
- 名称:其他宗教(other_religion),数据类型:字符串(string)
- 名称:其他性取向(other_sexual_orientation),数据类型:字符串(string)
- 名称:身体残疾(physical_disability),数据类型:字符串(string)
- 名称:精神疾病或心理障碍(psychiatric_or_mental_illness),数据类型:字符串(string)
- 名称:跨性别者(transgender),数据类型:字符串(string)
- 名称:白人(white),数据类型:字符串(string)
- 名称:有趣标记(funny),数据类型:字符串(string)
- 名称:惊叹标记(wow),数据类型:字符串(string)
- 名称:悲伤标记(sad),数据类型:字符串(string)
- 名称:点赞数(likes),数据类型:字符串(string)
- 名称:反对标记(disagree),数据类型:字符串(string)
- 名称:目标标签(target),数据类型:64位整数(int64)
- 名称:索引列(__index_level_0__),数据类型:64位整数(int64)
数据划分:
- 划分名称:train(训练集),字节数:43474162,样本数量:85000
- 划分名称:validation(验证集),字节数:7667244,样本数量:15000
- 划分名称:test(测试集),字节数:12792522,样本数量:25000
下载大小:0
数据集总大小:63933928字节
---
# “jigsaw_unintended_bias100K_all_text”数据集卡片
[需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
james-burton
原始信息汇总
数据集概述
数据集名称
- 名称: jigsaw_unintended_bias100K_all_text
数据集特征
- 特征列表:
- comment_text: 字符串
- asian: 字符串
- atheist: 字符串
- bisexual: 字符串
- black: 字符串
- buddhist: 字符串
- christian: 字符串
- female: 字符串
- heterosexual: 字符串
- hindu: 字符串
- homosexual_gay_or_lesbian: 字符串
- intellectual_or_learning_disability: 字符串
- jewish: 字符串
- latino: 字符串
- male: 字符串
- muslim: 字符串
- other_disability: 字符串
- other_gender: 字符串
- other_race_or_ethnicity: 字符串
- other_religion: 字符串
- other_sexual_orientation: 字符串
- physical_disability: 字符串
- psychiatric_or_mental_illness: 字符串
- transgender: 字符串
- white: 字符串
- funny: 字符串
- wow: 字符串
- sad: 字符串
- likes: 字符串
- disagree: 字符串
- target: 整数
- index_level_0: 整数
数据集分割
- 分割详情:
- train: 85000个样本,43474162字节
- validation: 15000个样本,7667244字节
- test: 25000个样本,12792522字节
数据集大小
- 下载大小: 0字节
- 数据集总大小: 63933928字节



