justinphan3110/wildchat_over_refusal
收藏Hugging Face2024-05-09 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/justinphan3110/wildchat_over_refusal
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: conversation_hash
dtype: string
- name: model
dtype: string
- name: timestamp
dtype: timestamp[us, tz=UTC]
- name: conversation
list:
- name: content
dtype: string
- name: country
dtype: string
- name: hashed_ip
dtype: string
- name: header
struct:
- name: accept-language
dtype: string
- name: user-agent
dtype: string
- name: language
dtype: string
- name: redacted
dtype: bool
- name: role
dtype: string
- name: state
dtype: string
- name: timestamp
dtype: timestamp[us, tz=UTC]
- name: toxic
dtype: bool
- name: turn_identifier
dtype: int64
- name: turn
dtype: int64
- name: language
dtype: string
- name: openai_moderation
list:
- name: categories
struct:
- name: harassment
dtype: bool
- name: harassment/threatening
dtype: bool
- name: harassment_threatening
dtype: bool
- name: hate
dtype: bool
- name: hate/threatening
dtype: bool
- name: hate_threatening
dtype: bool
- name: self-harm
dtype: bool
- name: self-harm/instructions
dtype: bool
- name: self-harm/intent
dtype: bool
- name: self_harm
dtype: bool
- name: self_harm_instructions
dtype: bool
- name: self_harm_intent
dtype: bool
- name: sexual
dtype: bool
- name: sexual/minors
dtype: bool
- name: sexual_minors
dtype: bool
- name: violence
dtype: bool
- name: violence/graphic
dtype: bool
- name: violence_graphic
dtype: bool
- name: category_scores
struct:
- name: harassment
dtype: float64
- name: harassment/threatening
dtype: float64
- name: harassment_threatening
dtype: float64
- name: hate
dtype: float64
- name: hate/threatening
dtype: float64
- name: hate_threatening
dtype: float64
- name: self-harm
dtype: float64
- name: self-harm/instructions
dtype: float64
- name: self-harm/intent
dtype: float64
- name: self_harm
dtype: float64
- name: self_harm_instructions
dtype: float64
- name: self_harm_intent
dtype: float64
- name: sexual
dtype: float64
- name: sexual/minors
dtype: float64
- name: sexual_minors
dtype: float64
- name: violence
dtype: float64
- name: violence/graphic
dtype: float64
- name: violence_graphic
dtype: float64
- name: flagged
dtype: bool
- name: detoxify_moderation
list:
- name: identity_attack
dtype: float64
- name: insult
dtype: float64
- name: obscene
dtype: float64
- name: severe_toxicity
dtype: float64
- name: sexual_explicit
dtype: float64
- name: threat
dtype: float64
- name: toxicity
dtype: float64
- name: toxic
dtype: bool
- name: redacted
dtype: bool
- name: state
dtype: string
- name: country
dtype: string
- name: hashed_ip
dtype: string
- name: header
struct:
- name: accept-language
dtype: string
- name: user-agent
dtype: string
splits:
- name: toxic
num_bytes: 2204600.0049548703
num_examples: 248
- name: nontoxic
num_bytes: 10489629.055833658
num_examples: 1180
download_size: 13080317
dataset_size: 12694229.060788529
configs:
- config_name: default
data_files:
- split: toxic
path: data/toxic-*
- split: nontoxic
path: data/nontoxic-*
---
提供机构:
justinphan3110
原始信息汇总
数据集概述
数据集特征
主要特征
- conversation_hash: 字符串类型
- model: 字符串类型
- timestamp: 时间戳类型,单位为微秒,时区为UTC
- turn: 整数类型,64位
- language: 字符串类型
- openai_moderation: 列表类型,包含以下结构:
- categories: 结构体,包含多个布尔类型字段,如
harassment,hate,self-harm,sexual,violence等 - category_scores: 结构体,包含多个浮点类型字段,对应
categories中的每个类别 - flagged: 布尔类型
- categories: 结构体,包含多个布尔类型字段,如
- detoxify_moderation: 列表类型,包含多个浮点类型字段,如
identity_attack,insult,obscene,severe_toxicity,sexual_explicit,threat,toxicity - toxic: 布尔类型
- redacted: 布尔类型
- state: 字符串类型
- country: 字符串类型
- hashed_ip: 字符串类型
- header: 结构体,包含以下字段:
- accept-language: 字符串类型
- user-agent: 字符串类型
对话详情
- conversation: 列表类型,包含以下字段:
- content: 字符串类型
- country: 字符串类型
- hashed_ip: 字符串类型
- header: 结构体,包含以下字段:
- accept-language: 字符串类型
- user-agent: 字符串类型
- language: 字符串类型
- redacted: 布尔类型
- role: 字符串类型
- state: 字符串类型
- timestamp: 时间戳类型,单位为微秒,时区为UTC
- toxic: 布尔类型
- turn_identifier: 整数类型,64位
数据集分割
- toxic: 包含248个示例,总大小为2204600.0049548703字节
- nontoxic: 包含1180个示例,总大小为10489629.055833658字节
数据集大小
- 下载大小: 13080317字节
- 数据集总大小: 12694229.060788529字节



