Salesforce/ContextualJudgeBench

Name: Salesforce/ContextualJudgeBench
Creator: Salesforce
Published: 2025-03-21 00:50:22
License: 暂无描述

Hugging Face2025-03-21 更新2025-04-08 收录

下载链接：

https://hf-mirror.com/datasets/Salesforce/ContextualJudgeBench

下载链接

链接失效反馈

官方服务：

资源简介：

ContextualJudgeBench是一个包含2000个样本的对子基准，用于评估两种上下文环境（上下文问答和总结）中的LLM-as-judge模型。每个样本包含问题ID、原始用户输入的问题、回答问题的上下文、更好的（选定的）响应、更差的（拒绝的）响应以及样本来源数据集。

ContextualJudgeBench is a pairwise benchmark with 2,000 samples for evaluating LLM-as-judge models in two contextual settings: Contextual QA and summarization. Each sample includes a problem ID, the original user input question, the context used to answer the user question, the better (chosen) response, the worse (rejected) response, and the source dataset from which the sample is derived.

提供机构：

Salesforce

5,000+

优质数据集

54 个

任务类型

进入经典数据集