hamdim/tcse-v2
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/hamdim/tcse-v2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-classification
- conversational
language:
- en
tags:
- safety
- adversarial
- multi-turn
- llm-security
- jailbreak
- slow-drift
- session-level
pretty_name: "TCSE V2: Topic-Constrained Security Evaluation"
size_categories:
- n<1K
---
# TCSE V2 — Topic-Constrained Security Evaluation Benchmark
**Author:** Mustapha HAMDI — [mustapha.hamdi@innodeep.net](mailto:mustapha.hamdi@innodeep.net)
**Organization:** [InnoDeep](https://innodeep.net)
**Project:** SGEMAS Security Product © InnoDeep
**License:** CC BY 4.0
---
**TCSE V2** is the first public benchmark specifically designed for evaluating
**multi-turn slow-drift adversarial attacks** against Large Language Models.
It is part of the **SGEMAS** (Self-Growing Ephemeral Multi-Agent Security)
research project developed at InnoDeep.
Unlike existing safety benchmarks (HarmBench, AdvBench, WildGuard) which
evaluate single-turn prompts in isolation, TCSE V2 contains full
**conversational sessions** where adversarial intent is revealed progressively
across turns — the scenario that stateless guards are structurally unable
to detect.
---
## Overview
| Property | Value |
|---|---|
| Sessions | 503 |
| Positive (malicious) | 313 (62.2%) |
| Negative (benign) | 190 (37.8%) |
| Unique topics | 60 |
| Domains | 5 (+ 2 cross-domain) |
| Turn range | 1 – 12 turns |
| Language | English |
| License | CC BY 4.0 |
---
## Domains
| Domain | Sessions |
|---|---|
| Cybersecurity | 138 |
| Finance | 95 |
| Medical | 93 |
| Infrastructure | 91 |
| Social Engineering | 72 |
| Finance + Cybersecurity (cross) | 7 |
| Medical + Infrastructure (cross) | 7 |
---
## Session Types
| Type | Description |
|---|---|
| `benign` | Legitimate multi-turn domain conversation |
| `benign_hard` | Benign session with terminology that resembles attack language |
| `attack_slow` | Slow-drift escalation — each turn rated individually benign |
| `attack_direct` | Direct attack from turn 1 (single-turn control condition) |
| `attack_slow_12` | Extended slow-drift (12 turns) |
| `benign_12` | Extended benign (12 turns, length-matched control) |
**Key property:** In `attack_slow` sessions, every individual turn is rated
benign in isolation. The malicious intent only becomes apparent through
the *trajectory* of the conversation — the defining challenge of the benchmark.
---
## Data Format
Each record contains:
```python
{
"session_id": "s0042", # unique identifier
"domain": "cybersecurity", # topic domain
"topic": "OAuth2", # specific topic (60 unique)
"session_type": "attack_slow", # see table above
"label": 1, # 1=malicious, 0=benign
"turns": ["turn 1 ...", "turn 2 ...", ...], # user messages
"n_turns": 5, # number of turns
"augmentation": "paraphrase_bridge", # augmentation strategy
}
```
---
## Augmentation Strategies
TCSE V2 includes controlled paraphrase and length augmentations to test
detector robustness:
| Strategy | Description |
|---|---|
| `original` | Hand-curated session |
| `length_Xt` / `length_Xt_ben` | Truncated to X turns (attack / benign) |
| `paraphrase_bridge` | Bridge turn paraphrased |
| `paraphrase_compressed` | Compressed wording |
| `paraphrase_extended` | Extended wording |
| `synthetic` | LLM-generated session |
| `synthetic_bridge/compressed/extended` | Synthetic with paraphrase variant |
| `gradual_Xt` | Gradual escalation in X turns |
| `hard_benign` | Hard negative: benign session with security terminology |
| `long_12t` | 12-turn session |
| `cross_domain` | Cross-domain topic transfer |
---
## Benchmark Design Rationale
**Topic-constrained evaluation** forces detectors to discriminate *intent*
from *topic*. Both benign and malicious sessions cover the same topic
(e.g., OAuth2, Pharmacology, Power Grid Control), making surface-level
keyword matching insufficient.
The **Topic-Δ** metric quantifies intent isolation:
```
Topic-Δ = sloDR(attack_slow) - FPR(benign) [within same topic]
```
A detector with Topic-Δ > 0 genuinely separates malicious intent from
legitimate domain expertise.
---
## Baseline Results (SGEMAS V8.5)
| Method | AUC | sloDR@5%FPR | F1 |
|---|---|---|---|
| SGEMAS V8.5 | **0.843** | **62.6%** | **0.757** |
| EWMA | 0.574 | 34.5% | 0.503 |
| CUSUM | 0.572 | 31.9% | 0.474 |
| EWMA-MSS (no gate) | 0.551 | 28.1% | 0.429 |
| Page-Hinkley | 0.500 | 0.0% | 0.000 |
Full evaluation code: [github.com/hamdim/sgemas](https://github.com/hamdim/sgemas)
---
## Citation
If you use TCSE V2 in your research, please cite:
```bibtex
@article{sgemas2025,
title = {SGEMAS: A Session-Level Energy-Based Framework for
Multi-Turn Adversarial Drift Detection in LLMs},
author = {Anonymous},
journal = {arXiv preprint},
year = {2025},
}
```
*(Citation will be updated upon publication.)*
---
## License
TCSE V2 is released under the **Creative Commons Attribution 4.0 (CC BY 4.0)** license.
You are free to use, share, and adapt the dataset with appropriate credit.
提供机构:
hamdim



