mats-10-sprint-cs-jb/loracle-ood-agentic

Name: mats-10-sprint-cs-jb/loracle-ood-agentic
Creator: mats-10-sprint-cs-jb
Published: 2026-04-23 21:37:47
License: 暂无描述

Hugging Face2026-04-23 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/mats-10-sprint-cs-jb/loracle-ood-agentic

下载链接

链接失效反馈

官方服务：

资源简介：

Loracle审计发现数据集是一个结构化记录，包含对551个Qwen3-14B模型（LoRA适配器和完整微调）的代理审计探测。该数据集在MATS-10 Loracles sprint期间产生，作为权重读取“loracle”项目的一部分。数据集分为多个配置：targets（每个被审计模型一行，包含广告行为、训练谱系等）、findings（每个确认的假设一行，包含模式标签、完整示例完成等）、probes（每个单独探测调用一行，包含提示、目标完成等）和briefs（用于驱动代理运行的Markdown审计简报）。数据集还包括校准的S scale用于测量行为差异，以及模式标签的受控词汇表。数据集的来源和当前对敏感内容的处理（未编辑）也有说明。

The Loracle Audit Findings dataset is a structured record of agentic audit probes on 551 Qwen3-14B models (LoRA adapters + full fine-tunes). Produced during the MATS-10 Loracles sprint as part of the weight-reading loracle project. The dataset is divided into several configs: targets (one row per model audited, with advertised behavior, training lineage, etc.), findings (one row per confirmed hypothesis, with pattern tags, full example completions, etc.), probes (one row per individual probe call, with prompt, target completion, etc.), and briefs (the markdown auditor-agent briefs used to drive the agent runs). The dataset also includes a calibrated S scale for measuring behavioral divergence and a controlled vocabulary for pattern tags. The provenance and current handling of sensitive content (unredacted) are also described.

提供机构：

mats-10-sprint-cs-jb

5,000+

优质数据集

54 个

任务类型

进入经典数据集