Cant
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/cistineup/CantCounter
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个专注于特定领域内的隐语或黑话的资料集,旨在评估大型语言模型(LLMs)在面对恶意利用时的表现。它不仅允许对语料库在特定领域内的性能和语言变化进行综合分析,而且规模庞大:初始数据集将692种隐语与53个实体相匹配,生成了69,892个场景,并进一步扩展到了1,677,408个场景。该数据集的任务包括问答任务,涵盖了抽象性、是非题以及多项选择题等多种形式。
This dataset is a corpus focused on jargon and cant within a specific domain, designed to evaluate the performance of large language models (LLMs) when subjected to malicious exploitation. It not only enables comprehensive analysis of the corpus's performance and linguistic variations within the specific domain, but also has a considerable scale: the initial dataset matched 692 jargon terms with 53 entities, generating 69,892 scenarios, which was further expanded to 1,677,408 scenarios. The dataset includes question answering tasks covering various formats such as abstract questions, true-false questions, and multiple-choice questions.
提供机构:
Cistineup



