llm-council/emotional_application

Name: llm-council/emotional_application
Creator: llm-council
Published: 2024-07-15 22:24:14
License: 暂无描述

Hugging Face2024-07-15 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/llm-council/emotional_application

下载链接

链接失效反馈

官方服务：

资源简介：

LMC-EA数据集旨在展示如何通过LLM委员会的集体共识来评估基础模型在情感智能等高度主观任务上的表现。数据集包含四个子集：1) test_set_formulation：由20个不同LLM生成的详细故事，描述人际冲突；2) response_collection：20个不同LLM对100个人际冲突的对话响应；3) response_judging：LLM对每个非参考LLM响应与参考LLM响应的成对比较评分；4) response_judging_human：人类对9个LLM和120个随机采样的困境-响应对的成对比较评分。数据集收集于2024年4月和5月，涉及多个LLM提供商和API，并且详细描述了人类评分者的背景和补偿情况。

提供机构：

llm-council

原始信息汇总

LMC-EA 数据集概述

数据集配置

默认配置

特征:
- emobench_id: 整数类型
- problem: 字符串类型
- relationship: 字符串类型
- scenario: 字符串类型
- detailed_dilemma: 字符串类型
- llm_author: 字符串类型
分割:
- council: 378036 字节, 200 个样本
下载大小: 228230 字节
数据集大小: 378036 字节

响应收集配置

特征:
- emobench_id: 整数类型
- problem: 字符串类型
- relationship: 字符串类型
- scenario: 字符串类型
- detailed_dilemma: 字符串类型
- response_string: 字符串类型
- llm_responder: 字符串类型
分割:
- council: 12889489 字节, 4000 个样本
下载大小: 3775450 字节
数据集大小: 12889489 字节

响应评判配置

特征:
- emobench_id: 整数类型
- llm_judge: 字符串类型
- judging_response_string: 字符串类型
- first_completion_by: 字符串类型
- second_completion_by: 字符串类型
- pairwise_choice: 字符串类型
分割:
- council: 85613515 字节, 76000 个样本
下载大小: 27616919 字节
数据集大小: 85613515 字节

人工响应评判配置

特征:
- emobench_id: 整数类型
- question_id: 整数类型
- annotator_id: 字符串类型
- response: 字符串类型
- first_completion_by: 字符串类型
- second_completion_by: 字符串类型
- eq: 布尔类型
- e1: 布尔类型
- e3: 布尔类型
- e4: 布尔类型
- e5: 布尔类型
- u1: 布尔类型
- u2: 布尔类型
- u3: 布尔类型
- u4: 布尔类型
- action: 布尔类型
- clarity: 布尔类型
- concise: 布尔类型
- qualitative: 字符串类型
- winner: 字符串类型
- consistency: 字符串类型
- reject: 布尔类型
- pairwise_choice: 字符串类型
分割:
- train: 300246 字节, 1343 个样本
下载大小: 41869 字节
数据集大小: 300246 字节

测试集制定配置

特征:
- emobench_id: 整数类型
- problem: 字符串类型
- relationship: 字符串类型
- scenario: 字符串类型
- detailed_dilemma: 字符串类型
- llm_author: 字符串类型
分割:
- council: 378036 字节, 200 个样本
下载大小: 228230 字节
数据集大小: 378036 字节

数据集组成

test_set_formulation: 包含 200 个人际冲突。
response_collection: 包含 100 个人际冲突 x 20 个 LLM = 2000 个响应。
response_judging: 包含 100 个人际冲突 x 19 个非参考 LLM 响应 x 20 个 LLM 评判 x 2 个位置交换 = 76000 个响应。
response_judging_human: 包含 1343 个人类评判。

数据集收集过程

LLM 输出通过多种提供者和 API 获取。
对话响应收集使用默认温度，响应评判使用温度 0。

数据集收集时间

数据集在 2024 年 4 月和 5 月收集。

5,000+

优质数据集

54 个

任务类型

进入经典数据集