okoc/toxigen_annotated_per
收藏Hugging Face2024-05-24 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/okoc/toxigen_annotated_per
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: target_group
dtype: string
- name: factual?
dtype: string
- name: ingroup_effect
dtype: string
- name: lewd
dtype: string
- name: framing
dtype: string
- name: predicted_group
dtype: string
- name: stereotyping
dtype: string
- name: intent
dtype: float64
- name: toxicity_ai
dtype: float64
- name: toxicity_human
dtype: float64
- name: predicted_author
dtype: string
- name: actual_method
dtype: string
- name: text_per_B
dtype: string
- name: text_per_C
dtype: string
- name: text_per_D
dtype: string
- name: text_per_E
dtype: string
splits:
- name: test
num_bytes: 807482
num_examples: 940
- name: train
num_bytes: 7332618
num_examples: 8960
download_size: 3803467
dataset_size: 8140100
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
- split: train
path: data/train-*
---
The dataset includes various features such as text, target group, factual status, ingroup effect, lewdness, framing, predicted group, stereotyping, intent, AI toxicity score, human toxicity score, etc. The dataset is divided into a training set and a test set, containing 8960 and 940 samples respectively. The download size and total size of the dataset are 3803467 bytes and 8140100 bytes respectively.
提供机构:
okoc
原始信息汇总
数据集概述
数据集特征
- text:字符串类型
- target_group:字符串类型
- factual?:字符串类型
- ingroup_effect:字符串类型
- lewd:字符串类型
- framing:字符串类型
- predicted_group:字符串类型
- stereotyping:字符串类型
- intent:浮点数类型
- toxicity_ai:浮点数类型
- toxicity_human:浮点数类型
- predicted_author:字符串类型
- actual_method:字符串类型
- text_per_B:字符串类型
- text_per_C:字符串类型
- text_per_D:字符串类型
- text_per_E:字符串类型
数据集分割
- test:
- 字节数:807482
- 示例数:940
- train:
- 字节数:7332618
- 示例数:8960
数据集大小
- 下载大小:3803467字节
- 数据集总大小:8140100字节
数据文件配置
- default:
- test:路径为
data/test-* - train:路径为
data/train-*
- test:路径为



