five

kevinpro/WildChat-1M-GPT4

收藏
Hugging Face2024-05-06 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/kevinpro/WildChat-1M-GPT4
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: conversation_hash dtype: string - name: model dtype: string - name: timestamp dtype: timestamp[us, tz=UTC] - name: conversation list: - name: content dtype: string - name: country dtype: string - name: hashed_ip dtype: string - name: header struct: - name: accept-language dtype: string - name: user-agent dtype: string - name: language dtype: string - name: redacted dtype: bool - name: role dtype: string - name: state dtype: string - name: timestamp dtype: timestamp[us, tz=UTC] - name: toxic dtype: bool - name: turn_identifier dtype: int64 - name: turn dtype: int64 - name: language dtype: string - name: openai_moderation list: - name: categories struct: - name: harassment dtype: bool - name: harassment/threatening dtype: bool - name: harassment_threatening dtype: bool - name: hate dtype: bool - name: hate/threatening dtype: bool - name: hate_threatening dtype: bool - name: self-harm dtype: bool - name: self-harm/instructions dtype: bool - name: self-harm/intent dtype: bool - name: self_harm dtype: bool - name: self_harm_instructions dtype: bool - name: self_harm_intent dtype: bool - name: sexual dtype: bool - name: sexual/minors dtype: bool - name: sexual_minors dtype: bool - name: violence dtype: bool - name: violence/graphic dtype: bool - name: violence_graphic dtype: bool - name: category_scores struct: - name: harassment dtype: float64 - name: harassment/threatening dtype: float64 - name: harassment_threatening dtype: float64 - name: hate dtype: float64 - name: hate/threatening dtype: float64 - name: hate_threatening dtype: float64 - name: self-harm dtype: float64 - name: self-harm/instructions dtype: float64 - name: self-harm/intent dtype: float64 - name: self_harm dtype: float64 - name: self_harm_instructions dtype: float64 - name: self_harm_intent dtype: float64 - name: sexual dtype: float64 - name: sexual/minors dtype: float64 - name: sexual_minors dtype: float64 - name: violence dtype: float64 - name: violence/graphic dtype: float64 - name: violence_graphic dtype: float64 - name: flagged dtype: bool - name: detoxify_moderation list: - name: identity_attack dtype: float64 - name: insult dtype: float64 - name: obscene dtype: float64 - name: severe_toxicity dtype: float64 - name: sexual_explicit dtype: float64 - name: threat dtype: float64 - name: toxicity dtype: float64 - name: toxic dtype: bool - name: redacted dtype: bool - name: state dtype: string - name: country dtype: string - name: hashed_ip dtype: string - name: header struct: - name: accept-language dtype: string - name: user-agent dtype: string splits: - name: train num_bytes: 1961240610.8595295 num_examples: 220624 download_size: 1280104055 dataset_size: 1961240610.8595295 configs: - config_name: default data_files: - split: train path: data/train-* ---

This dataset is used for conversation analysis and content moderation, containing features such as conversation hash, model used, timestamp, conversation content, country, hashed IP address, header information, language, whether it is redacted, role, state, whether it contains toxic content, turn identifier, language, OpenAI moderation results, Detoxify moderation results, etc. The dataset is divided into a training set, containing 220624 samples, with a total size of 1961240610.8595295 bytes.
提供机构:
kevinpro
原始信息汇总

数据集概述

数据集特征

主要特征

  • conversation_hash: 数据类型为字符串。
  • model: 数据类型为字符串。
  • timestamp: 数据类型为时间戳,单位为微秒,时区为UTC。
  • turn: 数据类型为整数。
  • language: 数据类型为字符串。
  • openai_moderation: 包含多个子特征,主要为分类和分类分数,数据类型包括布尔型和浮点型。
  • detoxify_moderation: 包含多个子特征,数据类型为浮点型。
  • toxic: 数据类型为布尔型。
  • redacted: 数据类型为布尔型。
  • state: 数据类型为字符串。
  • country: 数据类型为字符串。
  • hashed_ip: 数据类型为字符串。
  • header: 包含多个子特征,数据类型为字符串。

详细特征

  • conversation: 包含多个子特征,如内容、国家、哈希IP、头部信息、语言、是否屏蔽、角色、状态、时间戳、是否有毒、轮次标识等,数据类型包括字符串、布尔型、整数和时间戳。

数据集划分

  • train: 包含220624个样本,数据集大小为1961240610.8595295字节。

数据集大小

  • 下载大小: 1280104055字节。
  • 数据集大小: 1961240610.8595295字节。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作