AudioJailbreak

Name: AudioJailbreak
Creator: maas
Published: 2026-05-07 21:35:18
License: 暂无描述

魔搭社区2026-05-07 更新2025-05-17 收录

下载链接：

https://modelscope.cn/datasets/MBZUAI/AudioJailbreak

下载链接

链接失效反馈

官方服务：

资源简介：

# Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/) [![Dataset](https://img.shields.io/badge/🤗%20Dataset-Hugging%20Face-yellow)](https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak) AudioJailbreak is a benchmark framework specifically designed for evaluating the security of Audio Language Models (Audio LLMs). This project tests model defenses against malicious requests through various audio perturbation techniques. **Note**: This project aims to improve the security of audio language models. Researchers should use this tool responsibly. ## 📋 Table of Contents - [Project Overview](#project-overview) - [Installation Guide](#installation-guide) - [Dataset](#dataset) - [Code Structure](#code-structure) - [Usage](#usage) - [Citation](#citation) - [License](#license) ## 📝 Project Overview AudioJailbreak provides a comprehensive evaluation framework for testing the robustness of audio language models against adversarial attacks. Our method incorporates carefully designed perturbations in audio inputs to test model security mechanisms. Key features include: - **Diverse test cases**: Covering multiple categories of harmful speech samples - **Automated evaluation pipeline**: End-to-end automation from audio processing to result analysis - **Bayesian optimization**: Intelligent search for optimal perturbation parameters - **Multi-model compatibility**: Support for evaluating mainstream audio language models ## 🔧 Installation Guide 1. Clone repository: ```bash git clone https://github.com/PbRQianJiang/AudioJailbreak.git cd AudioJailbreak ``` 2. Create and activate environment: ```bash conda env create -f environment.yaml conda activate Audiojailbreak ``` 3. Download dataset (from Hugging Face): ``` Link: https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak ``` ## 💾 Dataset **Important Notice**: This repository contains code only. All audio data and preprocessed/inference result JSONL files are hosted on [Hugging Face](https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak). Dataset includes: - Original speech samples (`audio/`) - Input JSONL files (`convert/question`) - Model responses and **APT audio** (`inference/response`) - Evaluation results (`eval/xx`), where xx is model name - Original texts (`text/`) ## 📁 Code Structure ``` (Github struct) AudioJailbreak/ ├── audio/ # Audio processing tools (actual audio files on Hugging Face) ├── convert/ # Data conversion & formatting (actual JSONL files on Hugging Face) ├── eval/ # Evaluation scripts & metrics ├── inference/ # Model inference & adversarial testing ├── script/ # Speech conversion scripts ├── experiment/ # Threshold experiment configurations └── text/ # Text processing utilities ``` ### Core Components - **new_add_noise.py**: Implements multiple audio perturbation methods - **APT_BO_diva.py**: Bayesian optimization-based adversarial sample generation - **xx.py**: Model inference interfaces (xx represents model name) - **eval.py**: Evaluation metric calculations ## 🚀 Usage ### 1. Data Preparation ``` Link: https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak ``` ### 2. Basic Audio Jailbreaking ```bash # Generate adversarial samples with Bayesian optimization python infer/xx.py # Replace xx with model name ``` ### 3. APT-based Bayesian Audio Jailbreaking ```bash # Generate adversarial samples with Bayesian optimization python infer/APT_BO_diva.py ``` ### 4. Model Evaluation ```bash # Single model evaluation python eval_utils/eval.py # Batch evaluation bash eval_utils/auto.sh ``` ### 5. Result Analysis ```bash # Generate evaluation report python eval/all_average.py ``` ## 📊 Evaluation Metrics Our benchmark uses multiple metrics to assess model security: - **dict_dsr**: Model refusal rate (dict_dsr=1-asr) - **Semantic Similarity**: Semantic similarity of refusal responses - **GPT Score**: Includes sub-metrics: - Harmfulness: Degree of harmful content - Policy Violation: Policy violation rate - Relevance: Response relevance ## 📄 Citation If you use AudioJailbreak in your research, please cite our paper: ## 📜 License This project is licensed under [Apache License 2.0](LICENSE). --- **Note**: This project aims to improve the security of audio language models. Researchers must use this tool responsibly.

# Audio Jailbreak：面向大音频语言模型越狱攻击的开源综合基准测试集 [![License: Apache 2.0]("https://img.shields.io/badge/License-Apache%202.0-blue.svg")]("https://opensource.org/licenses/Apache-2.0") [![Python 3.10+]("https://img.shields.io/badge/python-3.10+-blue.svg")]("https://www.python.org/downloads/") [![Dataset]("https://img.shields.io/badge/🤗%20Dataset-Hugging%20Face-yellow")]("https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak") AudioJailbreak是专为评估**音频语言模型（Audio Language Models, Audio LLMs）**安全性而设计的基准测试框架。本项目通过多种音频扰动技术，测试模型对恶意请求的防御能力。 **注**：本项目旨在提升音频语言模型的安全性，研究人员应负责任地使用本工具。 ## 📋 目录 - [项目概述](#项目概述) - [安装指南](#安装指南) - [数据集](#数据集) - [代码结构](#代码结构) - [使用方法](#使用方法) - [引用](#引用) - [许可证](#许可证) ## 📝 项目概述 AudioJailbreak提供了一套全面的评估框架，用于测试音频语言模型对抗对抗性攻击的鲁棒性。本方法通过在音频输入中加入精心设计的扰动，来检验模型的安全防护机制。其核心特性包括： - **多样化测试用例**：涵盖多类有害语音样本 - **自动化评估流程**：从音频处理到结果分析的端到端自动化 - **贝叶斯优化**：智能搜索最优扰动参数 - **多模型兼容性**：支持对主流音频语言模型进行评估 ## 🔧 安装指南 1. 克隆仓库： bash git clone https://github.com/PbRQianJiang/AudioJailbreak.git cd AudioJailbreak 2. 创建并激活运行环境： bash conda env create -f environment.yaml conda activate Audiojailbreak 3. 下载数据集（可从Hugging Face获取）： Link: "https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak" ## 💾 数据集 **重要声明**：本仓库仅包含代码。所有音频数据以及预处理/推理结果的JSONL文件均托管于[Hugging Face平台]("https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak")。数据集包含以下内容： - 原始语音样本（`audio/`目录） - 输入JSONL文件（`convert/question`目录） - 模型响应与**APT音频**（`inference/response`目录） - 评估结果（`eval/xx`目录，xx代表模型名称） - 原始文本（`text/`目录） ## 📁 代码结构 (Github struct) AudioJailbreak/ ├── audio/ # 音频处理工具（实际音频文件存储于Hugging Face） ├── convert/ # 数据转换与格式化（实际JSONL文件存储于Hugging Face） ├── eval/ # 评估脚本与评估指标 ├── inference/ # 模型推理与对抗性测试 ├── script/ # 语音转换脚本 ├── experiment/ # 阈值实验配置 └── text/ # 文本处理工具 ### 核心组件 - **new_add_noise.py**：实现了多种音频扰动方法 - **APT_BO_diva.py**：基于贝叶斯优化的对抗样本生成脚本 - **xx.py**：模型推理接口（xx指代模型名称） - **eval.py**：评估指标计算脚本 ## 🚀 使用方法 ### 1. 数据准备 Link: "https://huggingface.co/datasets/NEUQ-LIS-LAB/AudioJailbreak" ### 2. 基础音频越狱攻击 bash # 通过贝叶斯优化生成对抗样本 python infer/xx.py # 将xx替换为具体模型名称 ### 3. 基于APT的贝叶斯音频越狱攻击 bash # 通过贝叶斯优化生成对抗样本 python infer/APT_BO_diva.py ### 4. 模型评估 bash # 单模型评估 python eval_utils/eval.py # 批量评估 bash eval_utils/auto.sh ### 5. 结果分析 bash # 生成评估报告 python eval/all_average.py ## 📊 评估指标本基准测试采用多项指标评估模型安全性： - **dict_dsr**：模型拒绝率（dict_dsr=1-asr） - **语义相似度**：拒绝响应的语义相似度 - **GPT评分**：包含以下子指标： - 有害性：有害内容的程度 - 政策违规率：违反内容政策的比例 - 相关性：响应与查询的相关性 ## 📄 引用若您在研究中使用AudioJailbreak，请引用我们的论文： ## 📜 许可证本项目采用[Apache许可证2.0](LICENSE)授权。 --- **注**：本项目旨在提升音频语言模型的安全性，研究人员必须负责任地使用本工具。

提供机构：

maas

创建时间：

2025-05-11

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集