agie-ai/lmsys-chatbot_arena_conversations

Name: agie-ai/lmsys-chatbot_arena_conversations
Creator: agie-ai
Published: 2023-07-22 04:52:36
License: 暂无描述

Hugging Face2023-07-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/agie-ai/lmsys-chatbot_arena_conversations

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: question_id dtype: string - name: model_a dtype: string - name: model_b dtype: string - name: winner dtype: string - name: judge dtype: string - name: conversation_a list: - name: content dtype: string - name: role dtype: string - name: conversation_b list: - name: content dtype: string - name: role dtype: string - name: turn dtype: int64 - name: anony dtype: bool - name: language dtype: string - name: tstamp dtype: float64 - name: openai_moderation struct: - name: categories struct: - name: harassment dtype: bool - name: harassment/threatening dtype: bool - name: hate dtype: bool - name: hate/threatening dtype: bool - name: self-harm dtype: bool - name: self-harm/instructions dtype: bool - name: self-harm/intent dtype: bool - name: sexual dtype: bool - name: sexual/minors dtype: bool - name: violence dtype: bool - name: violence/graphic dtype: bool - name: category_scores struct: - name: harassment dtype: float64 - name: harassment/threatening dtype: float64 - name: hate dtype: float64 - name: hate/threatening dtype: float64 - name: self-harm dtype: float64 - name: self-harm/instructions dtype: float64 - name: self-harm/intent dtype: float64 - name: sexual dtype: float64 - name: sexual/minors dtype: float64 - name: violence dtype: float64 - name: violence/graphic dtype: float64 - name: flagged dtype: bool - name: toxic_chat_tag struct: - name: roberta-large struct: - name: flagged dtype: bool - name: probability dtype: float64 - name: t5-large struct: - name: flagged dtype: bool - name: score dtype: float64 splits: - name: train num_bytes: 81159839 num_examples: 33000 download_size: 41572997 dataset_size: 81159839 --- # Dataset Card for "lmsys-chatbot_arena_conversations" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

数据集信息：特征项： - 字段名：question_id（问题ID），数据类型：字符串（string） - 字段名：model_a（模型A），数据类型：字符串（string） - 字段名：model_b（模型B），数据类型：字符串（string） - 字段名：winner（获胜方），数据类型：字符串（string） - 字段名：judge（评判者），数据类型：字符串（string） - 字段名：conversation_a（对话A），数据类型：列表（list），列表元素包含： - 字段名：content（内容），数据类型：字符串（string） - 字段名：role（角色），数据类型：字符串（string） - 字段名：conversation_b（对话B），数据类型：列表（list），列表元素包含： - 字段名：content（内容），数据类型：字符串（string） - 字段名：role（角色），数据类型：字符串（string） - 字段名：turn（对话轮次），数据类型：64位整数（int64） - 字段名：anony（匿名性），数据类型：布尔值（bool） - 字段名：language（语言），数据类型：字符串（string） - 字段名：tstamp（时间戳），数据类型：64位浮点数（float64） - 字段名：openai_moderation（OpenAI审核），数据类型：结构体（struct），结构体包含： - 字段名：categories（分类结果），数据类型：结构体（struct），结构体包含： - 字段名：harassment（骚扰），数据类型：布尔值（bool） - 字段名：harassment/threatening（骚扰/威胁），数据类型：布尔值（bool） - 字段名：hate（仇恨言论），数据类型：布尔值（bool） - 字段名：hate/threatening（仇恨/威胁），数据类型：布尔值（bool） - 字段名：self-harm（自残），数据类型：布尔值（bool） - 字段名：self-harm/instructions（自残指导），数据类型：布尔值（bool） - 字段名：self-harm/intent（自残意图），数据类型：布尔值（bool） - 字段名：sexual（色情内容），数据类型：布尔值（bool） - 字段名：sexual/minors（未成年人色情），数据类型：布尔值（bool） - 字段名：violence（暴力内容），数据类型：布尔值（bool） - 字段名：violence/graphic（血腥暴力），数据类型：布尔值（bool） - 字段名：category_scores（分类得分），数据类型：结构体（struct），结构体包含： - 字段名：harassment（骚扰），数据类型：64位浮点数（float64） - 字段名：harassment/threatening（骚扰/威胁），数据类型：64位浮点数（float64） - 字段名：hate（仇恨言论），数据类型：64位浮点数（float64） - 字段名：hate/threatening（仇恨/威胁），数据类型：64位浮点数（float64） - 字段名：self-harm（自残），数据类型：64位浮点数（float64） - 字段名：self-harm/instructions（自残指导），数据类型：64位浮点数（float64） - 字段名：self-harm/intent（自残意图），数据类型：64位浮点数（float64） - 字段名：sexual（色情内容），数据类型：64位浮点数（float64） - 字段名：sexual/minors（未成年人色情），数据类型：64位浮点数（float64） - 字段名：violence（暴力内容），数据类型：64位浮点数（float64） - 字段名：violence/graphic（血腥暴力），数据类型：64位浮点数（float64） - 字段名：flagged（标记状态），数据类型：布尔值（bool） - 字段名：toxic_chat_tag（恶意对话标记），数据类型：结构体（struct），结构体包含： - 字段名：roberta-large，数据类型：结构体（struct），结构体包含： - 字段名：flagged（标记状态），数据类型：布尔值（bool） - 字段名：probability（概率值），数据类型：64位浮点数（float64） - 字段名：t5-large，数据类型：结构体（struct），结构体包含： - 字段名：flagged（标记状态），数据类型：布尔值（bool） - 字段名：score（得分），数据类型：64位浮点数（float64）数据集划分： - 划分名称：train（训练集），占用字节数：81159839，样本数量：33000 下载大小：41572997 数据集总大小：81159839 # "lmsys-chatbot_arena_conversations" 数据集卡片 [需补充更多信息](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

agie-ai

原始信息汇总

数据集概述

数据集名称

lmsys-chatbot_arena_conversations

数据集特征

question_id: 字符串类型
model_a: 字符串类型
model_b: 字符串类型
winner: 字符串类型
judge: 字符串类型
conversation_a: 列表类型，包含以下字段：
- content: 字符串类型
- role: 字符串类型
conversation_b: 列表类型，包含以下字段：
- content: 字符串类型
- role: 字符串类型
turn: 整数类型
anony: 布尔类型
language: 字符串类型
tstamp: 浮点数类型
openai_moderation: 结构体类型，包含以下字段：
- categories: 结构体类型，包含多个布尔类型的子字段，如harassment, harassment/threatening等
- category_scores: 结构体类型，包含多个浮点数类型的子字段，如harassment, harassment/threatening等
- flagged: 布尔类型
toxic_chat_tag: 结构体类型，包含以下字段：
- roberta-large: 结构体类型，包含flagged（布尔类型）和probability（浮点数类型）
- t5-large: 结构体类型，包含flagged（布尔类型）和score（浮点数类型）