SAGE-Eval

Name: SAGE-Eval
Creator: YuehHanChen
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://huggingface.co/datasets/YuehHanChen/SAGE-Eval

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为SAGE-Eval，旨在测试大型语言模型是否能将已确立的安全知识恰当地应用于用户的简单查询。它包含了104条手工搜集的安全知识，这些知识在7个常见领域中被系统地增强，生成了10,428个测试场景。该数据集的安全知识来源于信誉良好的组织，其目的是评估模型在处理显著安全风险时的可靠性。规模上，该数据集包含了10,428个测试场景，任务是对大型语言模型在安全知识的系统泛化上进行评估。

This dataset, named SAGE-Eval, is designed to test whether large language models (LLMs) can properly apply established security knowledge to users' simple queries. It contains 104 manually collected security knowledge entries, which were systematically augmented across 7 common domains to generate 10,428 test scenarios. The security knowledge in this dataset is sourced from reputable organizations, with the core goal of evaluating the reliability of models when handling significant security risks. In terms of scale, this dataset includes 10,428 test scenarios, which are used to assess the systematic generalization performance of large language models regarding security knowledge.

提供机构：

YuehHanChen

5,000+

优质数据集

54 个

任务类型

进入经典数据集