wentingzhao/entities
收藏Hugging Face2024-05-24 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/wentingzhao/entities
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: conversation_hash
dtype: string
- name: model
dtype: string
- name: timestamp
dtype: timestamp[us, tz=UTC]
- name: conversation
list:
- name: content
dtype: string
- name: country
dtype: string
- name: hashed_ip
dtype: string
- name: header
struct:
- name: accept-language
dtype: string
- name: user-agent
dtype: string
- name: language
dtype: string
- name: redacted
dtype: bool
- name: role
dtype: string
- name: state
dtype: string
- name: timestamp
dtype: timestamp[us, tz=UTC]
- name: toxic
dtype: bool
- name: turn_identifier
dtype: int64
- name: turn
dtype: int64
- name: language
dtype: string
- name: openai_moderation
list:
- name: categories
struct:
- name: harassment
dtype: bool
- name: harassment/threatening
dtype: bool
- name: harassment_threatening
dtype: bool
- name: hate
dtype: bool
- name: hate/threatening
dtype: bool
- name: hate_threatening
dtype: bool
- name: self-harm
dtype: bool
- name: self-harm/instructions
dtype: bool
- name: self-harm/intent
dtype: bool
- name: self_harm
dtype: bool
- name: self_harm_instructions
dtype: bool
- name: self_harm_intent
dtype: bool
- name: sexual
dtype: bool
- name: sexual/minors
dtype: bool
- name: sexual_minors
dtype: bool
- name: violence
dtype: bool
- name: violence/graphic
dtype: bool
- name: violence_graphic
dtype: bool
- name: category_scores
struct:
- name: harassment
dtype: float64
- name: harassment/threatening
dtype: float64
- name: harassment_threatening
dtype: float64
- name: hate
dtype: float64
- name: hate/threatening
dtype: float64
- name: hate_threatening
dtype: float64
- name: self-harm
dtype: float64
- name: self-harm/instructions
dtype: float64
- name: self-harm/intent
dtype: float64
- name: self_harm
dtype: float64
- name: self_harm_instructions
dtype: float64
- name: self_harm_intent
dtype: float64
- name: sexual
dtype: float64
- name: sexual/minors
dtype: float64
- name: sexual_minors
dtype: float64
- name: violence
dtype: float64
- name: violence/graphic
dtype: float64
- name: violence_graphic
dtype: float64
- name: flagged
dtype: bool
- name: detoxify_moderation
list:
- name: identity_attack
dtype: float64
- name: insult
dtype: float64
- name: obscene
dtype: float64
- name: severe_toxicity
dtype: float64
- name: sexual_explicit
dtype: float64
- name: threat
dtype: float64
- name: toxicity
dtype: float64
- name: toxic
dtype: bool
- name: redacted
dtype: bool
- name: state
dtype: string
- name: country
dtype: string
- name: hashed_ip
dtype: string
- name: header
struct:
- name: accept-language
dtype: string
- name: user-agent
dtype: string
- name: entities
sequence: string
splits:
- name: train
num_bytes: 4016255933
num_examples: 481268
download_size: 1882986167
dataset_size: 4016255933
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
提供机构:
wentingzhao
原始信息汇总
数据集概述
数据集特征
主要特征
- conversation_hash: 字符串类型
- model: 字符串类型
- timestamp: 时间戳类型,单位为微秒,时区为UTC
- turn: 整数类型,64位
- language: 字符串类型
- openai_moderation: 列表类型,包含多个结构化数据
- categories: 结构类型,包含多个布尔类型字段,如
harassment,hate,self-harm,sexual,violence等 - category_scores: 结构类型,包含多个浮点类型字段,对应于
categories中的每个类别 - flagged: 布尔类型
- categories: 结构类型,包含多个布尔类型字段,如
- detoxify_moderation: 列表类型,包含多个浮点类型字段,如
identity_attack,insult,obscene,severe_toxicity,sexual_explicit,threat,toxicity - toxic: 布尔类型
- redacted: 布尔类型
- state: 字符串类型
- country: 字符串类型
- hashed_ip: 字符串类型
- header: 结构类型,包含
accept-language和user-agent,均为字符串类型
对话特征
- conversation: 列表类型
- content: 字符串类型
- country: 字符串类型
- hashed_ip: 字符串类型
- header: 结构类型,包含
accept-language和user-agent,均为字符串类型 - language: 字符串类型
- redacted: 布尔类型
- role: 字符串类型
- state: 字符串类型
- timestamp: 时间戳类型,单位为微秒,时区为UTC
- toxic: 布尔类型
- turn_identifier: 整数类型,64位
数据集划分
- train: 训练集
- num_bytes: 4016255933字节
- num_examples: 481268个样本
数据集大小
- download_size: 1882986167字节
- dataset_size: 4016255933字节



