MultiverseComputingCAI/llm-refusal-evaluation
收藏Hugging Face2025-12-23 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/MultiverseComputingCAI/llm-refusal-evaluation
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为LLM拒绝评估基准,旨在评估大型语言模型(LLMs)在不同类别提示下的拒绝行为。数据集分为三个主要组别:安全基准、中国敏感话题和健全性检查数据集。每个组别包含特定的数据集,具有独特的特征和目的,例如评估模型对有害或越狱式提示的响应、中国敏感话题以及非敏感提示以确保模型不会过度拒绝。README还提供了每个子数据集的来源和方法,全面概述了数据集的组成和预期用途。
The dataset, named LLM Refusal Evaluation Benchmark, is designed to evaluate the refusal behavior of large language models (LLMs) across various categories of prompts. The dataset is organized into three main groups: Safety Benchmarks, Chinese Sensitive Topics, and Sanity Check Datasets. Each group contains specific datasets with unique characteristics and purposes, such as evaluating model responses to harmful or jailbreak-style prompts, Chinese sensitive topics, and non-sensitive prompts to ensure models do not over-refuse. The README also includes sources and methodologies for each sub-dataset, providing a comprehensive overview of the datasets composition and intended use.
提供机构:
MultiverseComputingCAI



