james-burton/jigsaw_unintended_bias100K_all_text

Name: james-burton/jigsaw_unintended_bias100K_all_text
Creator: james-burton
Published: 2023-05-02 15:59:59
License: 暂无描述

Hugging Face2023-05-02 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/james-burton/jigsaw_unintended_bias100K_all_text

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: comment_text dtype: string - name: asian dtype: string - name: atheist dtype: string - name: bisexual dtype: string - name: black dtype: string - name: buddhist dtype: string - name: christian dtype: string - name: female dtype: string - name: heterosexual dtype: string - name: hindu dtype: string - name: homosexual_gay_or_lesbian dtype: string - name: intellectual_or_learning_disability dtype: string - name: jewish dtype: string - name: latino dtype: string - name: male dtype: string - name: muslim dtype: string - name: other_disability dtype: string - name: other_gender dtype: string - name: other_race_or_ethnicity dtype: string - name: other_religion dtype: string - name: other_sexual_orientation dtype: string - name: physical_disability dtype: string - name: psychiatric_or_mental_illness dtype: string - name: transgender dtype: string - name: white dtype: string - name: funny dtype: string - name: wow dtype: string - name: sad dtype: string - name: likes dtype: string - name: disagree dtype: string - name: target dtype: int64 - name: __index_level_0__ dtype: int64 splits: - name: train num_bytes: 43474162 num_examples: 85000 - name: validation num_bytes: 7667244 num_examples: 15000 - name: test num_bytes: 12792522 num_examples: 25000 download_size: 0 dataset_size: 63933928 --- # Dataset Card for "jigsaw_unintended_bias100K_all_text" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

数据集信息：特征字段： - 名称：评论文本（comment_text），数据类型：字符串（string） - 名称：亚裔（asian），数据类型：字符串（string） - 名称：无神论者（atheist），数据类型：字符串（string） - 名称：双性恋者（bisexual），数据类型：字符串（string） - 名称：黑人（black），数据类型：字符串（string） - 名称：佛教徒（buddhist），数据类型：字符串（string） - 名称：基督教徒（christian），数据类型：字符串（string） - 名称：女性（female），数据类型：字符串（string） - 名称：异性恋者（heterosexual），数据类型：字符串（string） - 名称：印度教徒（hindu），数据类型：字符串（string） - 名称：男同性恋或女同性恋者（homosexual_gay_or_lesbian），数据类型：字符串（string） - 名称：智力或学习障碍（intellectual_or_learning_disability），数据类型：字符串（string） - 名称：犹太教徒（jewish），数据类型：字符串（string） - 名称：拉丁裔（latino），数据类型：字符串（string） - 名称：男性（male），数据类型：字符串（string） - 名称：穆斯林（muslim），数据类型：字符串（string） - 名称：其他残疾（other_disability），数据类型：字符串（string） - 名称：其他性别（other_gender），数据类型：字符串（string） - 名称：其他种族或族群（other_race_or_ethnicity），数据类型：字符串（string） - 名称：其他宗教（other_religion），数据类型：字符串（string） - 名称：其他性取向（other_sexual_orientation），数据类型：字符串（string） - 名称：身体残疾（physical_disability），数据类型：字符串（string） - 名称：精神疾病或心理障碍（psychiatric_or_mental_illness），数据类型：字符串（string） - 名称：跨性别者（transgender），数据类型：字符串（string） - 名称：白人（white），数据类型：字符串（string） - 名称：有趣标记（funny），数据类型：字符串（string） - 名称：惊叹标记（wow），数据类型：字符串（string） - 名称：悲伤标记（sad），数据类型：字符串（string） - 名称：点赞数（likes），数据类型：字符串（string） - 名称：反对标记（disagree），数据类型：字符串（string） - 名称：目标标签（target），数据类型：64位整数（int64） - 名称：索引列（__index_level_0__），数据类型：64位整数（int64）数据划分： - 划分名称：train（训练集），字节数：43474162，样本数量：85000 - 划分名称：validation（验证集），字节数：7667244，样本数量：15000 - 划分名称：test（测试集），字节数：12792522，样本数量：25000 下载大小：0 数据集总大小：63933928字节 --- # “jigsaw_unintended_bias100K_all_text”数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

james-burton

原始信息汇总

数据集概述

数据集名称

名称: jigsaw_unintended_bias100K_all_text

数据集特征

特征列表:
- comment_text: 字符串
- asian: 字符串
- atheist: 字符串
- bisexual: 字符串
- black: 字符串
- buddhist: 字符串
- christian: 字符串
- female: 字符串
- heterosexual: 字符串
- hindu: 字符串
- homosexual_gay_or_lesbian: 字符串
- intellectual_or_learning_disability: 字符串
- jewish: 字符串
- latino: 字符串
- male: 字符串
- muslim: 字符串
- other_disability: 字符串
- other_gender: 字符串
- other_race_or_ethnicity: 字符串
- other_religion: 字符串
- other_sexual_orientation: 字符串
- physical_disability: 字符串
- psychiatric_or_mental_illness: 字符串
- transgender: 字符串
- white: 字符串
- funny: 字符串
- wow: 字符串
- sad: 字符串
- likes: 字符串
- disagree: 字符串
- target: 整数
- index_level_0: 整数

数据集分割

分割详情:
- train: 85000个样本，43474162字节
- validation: 15000个样本，7667244字节
- test: 25000个样本，12792522字节

数据集大小

下载大小: 0字节
数据集总大小: 63933928字节

5,000+

优质数据集

54 个

任务类型

进入经典数据集