Salesforce/ContextualJudgeBench
收藏Hugging Face2025-03-21 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/Salesforce/ContextualJudgeBench
下载链接
链接失效反馈官方服务:
资源简介:
ContextualJudgeBench是一个包含2000个样本的对子基准,用于评估两种上下文环境(上下文问答和总结)中的LLM-as-judge模型。每个样本包含问题ID、原始用户输入的问题、回答问题的上下文、更好的(选定的)响应、更差的(拒绝的)响应以及样本来源数据集。
ContextualJudgeBench is a pairwise benchmark with 2,000 samples for evaluating LLM-as-judge models in two contextual settings: Contextual QA and summarization. Each sample includes a problem ID, the original user input question, the context used to answer the user question, the better (chosen) response, the worse (rejected) response, and the source dataset from which the sample is derived.
提供机构:
Salesforce



