kevinpro/WildChat-1M-GPT4
收藏Hugging Face2024-05-06 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/kevinpro/WildChat-1M-GPT4
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: conversation_hash
dtype: string
- name: model
dtype: string
- name: timestamp
dtype: timestamp[us, tz=UTC]
- name: conversation
list:
- name: content
dtype: string
- name: country
dtype: string
- name: hashed_ip
dtype: string
- name: header
struct:
- name: accept-language
dtype: string
- name: user-agent
dtype: string
- name: language
dtype: string
- name: redacted
dtype: bool
- name: role
dtype: string
- name: state
dtype: string
- name: timestamp
dtype: timestamp[us, tz=UTC]
- name: toxic
dtype: bool
- name: turn_identifier
dtype: int64
- name: turn
dtype: int64
- name: language
dtype: string
- name: openai_moderation
list:
- name: categories
struct:
- name: harassment
dtype: bool
- name: harassment/threatening
dtype: bool
- name: harassment_threatening
dtype: bool
- name: hate
dtype: bool
- name: hate/threatening
dtype: bool
- name: hate_threatening
dtype: bool
- name: self-harm
dtype: bool
- name: self-harm/instructions
dtype: bool
- name: self-harm/intent
dtype: bool
- name: self_harm
dtype: bool
- name: self_harm_instructions
dtype: bool
- name: self_harm_intent
dtype: bool
- name: sexual
dtype: bool
- name: sexual/minors
dtype: bool
- name: sexual_minors
dtype: bool
- name: violence
dtype: bool
- name: violence/graphic
dtype: bool
- name: violence_graphic
dtype: bool
- name: category_scores
struct:
- name: harassment
dtype: float64
- name: harassment/threatening
dtype: float64
- name: harassment_threatening
dtype: float64
- name: hate
dtype: float64
- name: hate/threatening
dtype: float64
- name: hate_threatening
dtype: float64
- name: self-harm
dtype: float64
- name: self-harm/instructions
dtype: float64
- name: self-harm/intent
dtype: float64
- name: self_harm
dtype: float64
- name: self_harm_instructions
dtype: float64
- name: self_harm_intent
dtype: float64
- name: sexual
dtype: float64
- name: sexual/minors
dtype: float64
- name: sexual_minors
dtype: float64
- name: violence
dtype: float64
- name: violence/graphic
dtype: float64
- name: violence_graphic
dtype: float64
- name: flagged
dtype: bool
- name: detoxify_moderation
list:
- name: identity_attack
dtype: float64
- name: insult
dtype: float64
- name: obscene
dtype: float64
- name: severe_toxicity
dtype: float64
- name: sexual_explicit
dtype: float64
- name: threat
dtype: float64
- name: toxicity
dtype: float64
- name: toxic
dtype: bool
- name: redacted
dtype: bool
- name: state
dtype: string
- name: country
dtype: string
- name: hashed_ip
dtype: string
- name: header
struct:
- name: accept-language
dtype: string
- name: user-agent
dtype: string
splits:
- name: train
num_bytes: 1961240610.8595295
num_examples: 220624
download_size: 1280104055
dataset_size: 1961240610.8595295
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
This dataset is used for conversation analysis and content moderation, containing features such as conversation hash, model used, timestamp, conversation content, country, hashed IP address, header information, language, whether it is redacted, role, state, whether it contains toxic content, turn identifier, language, OpenAI moderation results, Detoxify moderation results, etc. The dataset is divided into a training set, containing 220624 samples, with a total size of 1961240610.8595295 bytes.
提供机构:
kevinpro
原始信息汇总
数据集概述
数据集特征
主要特征
- conversation_hash: 数据类型为字符串。
- model: 数据类型为字符串。
- timestamp: 数据类型为时间戳,单位为微秒,时区为UTC。
- turn: 数据类型为整数。
- language: 数据类型为字符串。
- openai_moderation: 包含多个子特征,主要为分类和分类分数,数据类型包括布尔型和浮点型。
- detoxify_moderation: 包含多个子特征,数据类型为浮点型。
- toxic: 数据类型为布尔型。
- redacted: 数据类型为布尔型。
- state: 数据类型为字符串。
- country: 数据类型为字符串。
- hashed_ip: 数据类型为字符串。
- header: 包含多个子特征,数据类型为字符串。
详细特征
- conversation: 包含多个子特征,如内容、国家、哈希IP、头部信息、语言、是否屏蔽、角色、状态、时间戳、是否有毒、轮次标识等,数据类型包括字符串、布尔型、整数和时间戳。
数据集划分
- train: 包含220624个样本,数据集大小为1961240610.8595295字节。
数据集大小
- 下载大小: 1280104055字节。
- 数据集大小: 1961240610.8595295字节。



