LongSafety

arXiv2025-09-30 收录

下载链接：

https://github.com/thu-coai/LongSafety

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集名为LongSafety，是一个专为评估大型语言模型（LLM）在开放性长上下文任务中的安全性而设计的全面基准测试。它包含7类安全问题以及6个以用户为导向的长上下文任务，总计包含1,543个测试案例，平均每个上下文含有5,424个单词。该基准测试揭示了16个代表性LLM中的重大安全漏洞，特别关注长上下文任务所独有的挑战，以及不同安全类别下各模型的性能表现。该数据集的规模为1,543个测试案例，其任务是评估长上下文任务中LLM的安全性。

This dataset, named LongSafety, is a comprehensive benchmark specifically designed to evaluate the safety of Large Language Models (LLMs) in open-ended long-context tasks. It includes 7 categories of safety issues and 6 user-oriented long-context tasks, with a total of 1,543 test cases, each containing an average of 5,424 words. This benchmark has uncovered significant security vulnerabilities in 16 representative LLMs, with particular focus on the challenges unique to long-context tasks and the performance of each model across different safety categories. With 1,543 test cases in total, this dataset aims to assess the safety of LLMs in long-context tasks.

5,000+

优质数据集

54 个

任务类型

进入经典数据集