LanceaKing/asvspoof2019

Name: LanceaKing/asvspoof2019
Creator: LanceaKing
Published: 2022-11-11 08:41:54
License: 暂无描述

Hugging Face2022-11-11 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/LanceaKing/asvspoof2019

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - other language_creators: - other language: - en license: - odc-by multilinguality: - monolingual size_categories: - 100K<n<1M source_datasets: - extended|vctk task_categories: - audio-classification task_ids: [] pretty_name: asvspoof2019 tags: - voice-anti-spoofing --- # Dataset Card for asvspoof2019 ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** https://datashare.ed.ac.uk/handle/10283/3336 - **Repository:** [Needs More Information] - **Paper:** https://arxiv.org/abs/1911.01601 - **Leaderboard:** [Needs More Information] - **Point of Contact:** [Needs More Information] ### Dataset Summary This is a database used for the Third Automatic Speaker Verification Spoofing and Countermeasuers Challenge, for short, ASVspoof 2019 (http://www.asvspoof.org) organized by Junichi Yamagishi, Massimiliano Todisco, Md Sahidullah, Héctor Delgado, Xin Wang, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Ville Vestman, and Andreas Nautsch in 2019. ### Supported Tasks and Leaderboards [Needs More Information] ### Languages English ## Dataset Structure ### Data Instances ``` {'speaker_id': 'LA_0091', 'audio_file_name': 'LA_T_8529430', 'audio': {'path': 'D:/Users/80304531/.cache/huggingface/datasets/downloads/extracted/8cabb6d5c283b0ed94b2219a8d459fea8e972ce098ef14d8e5a97b181f850502/LA/ASVspoof2019_LA_train/flac/LA_T_8529430.flac', 'array': array([-0.00201416, -0.00234985, -0.0022583 , ..., 0.01309204, 0.01339722, 0.01461792], dtype=float32), 'sampling_rate': 16000}, 'system_id': 'A01', 'key': 1} ``` ### Data Fields Logical access (LA): - `speaker_id`: `LA_****`, a 4-digit speaker ID - `audio_file_name`: name of the audio file - `audio`: A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: `dataset[0]["audio"]` the audio file is automatically decoded and resampled to `dataset.features["audio"].sampling_rate`. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the `"audio"` column, *i.e.* `dataset[0]["audio"]` should **always** be preferred over `dataset["audio"][0]`. - `system_id`: ID of the speech spoofing system (A01 - A19), or, for bonafide speech SYSTEM-ID is left blank ('-') - `key`: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech Physical access (PA): - `speaker_id`: `PA_****`, a 4-digit speaker ID - `audio_file_name`: name of the audio file - `audio`: A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: `dataset[0]["audio"]` the audio file is automatically decoded and resampled to `dataset.features["audio"].sampling_rate`. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the `"audio"` column, *i.e.* `dataset[0]["audio"]` should **always** be preferred over `dataset["audio"][0]`. - `environment_id`: a triplet (S,R,D_s), which take one letter in the set {a,b,c} as categorical value, defined as | | a | b | c | | -------------------------------- | ------ | ------- | -------- | | S: Room size (square meters) | 2-5 | 5-10 | 10-20 | | R: T60 (ms) | 50-200 | 200-600 | 600-1000 | | D_s: Talker-to-ASV distance (cm) | 10-50 | 50-100 | 100-150 | - `attack_id`: a duple (D_a,Q), which take one letter in the set {A,B,C} as categorical value, defined as | | A | B | C | | ----------------------------------- | ------- | ------ | ----- | | Z: Attacker-to-talker distance (cm) | 10-50 | 50-100 | > 100 | | Q: Replay device quality | perfect | high | low | for bonafide speech, `attack_id` is left blank ('-') - `key`: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech ### Data Splits | | Training set | Development set | Evaluation set | | -------- | ------------ | --------------- | -------------- | | Bonafide | 2580 | 2548 | 7355 | | Spoof | 22800 | 22296 | 63882 | | Total | 25380 | 24844 | 71237 | ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators [Needs More Information] ### Licensing Information This ASVspoof 2019 dataset is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0/ ### Citation Information ``` @InProceedings{Todisco2019, Title = {{ASV}spoof 2019: {F}uture {H}orizons in {S}poofed and {F}ake {A}udio {D}etection}, Author = {Todisco, Massimiliano and Wang, Xin and Sahidullah, Md and Delgado, H ́ector and Nautsch, Andreas and Yamagishi, Junichi and Evans, Nicholas and Kinnunen, Tomi and Lee, Kong Aik}, booktitle = {Proc. of Interspeech 2019}, Year = {2019} } ```

--- annotations_creators: - 其他 language_creators: - 其他 language: - 英语 license: - 开放数据 Commons 署名许可（odc-by） multilinguality: - 单语言 size_categories: - 100K<n<1M source_datasets: - 扩展|vctk task_categories: - 音频分类 task_ids: [] pretty_name: asvspoof2019 tags: - 语音反欺骗 --- # ASVspoof 2019 数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [数据集构建初衷](#curation-rationale) - [源数据](#source-data) - [注释信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) ## 数据集描述 - **主页**: https://datashare.ed.ac.uk/handle/10283/3336 - **代码仓库**: [待补充] - **论文**: https://arxiv.org/abs/1911.01601 - **排行榜**: [待补充] - **联系人**: [待补充] ### 数据集摘要本数据集用于第三届自动语音验证欺骗与对抗挑战赛事（简称ASVspoof 2019，http://www.asvspoof.org），该赛事由Junichi Yamagishi、Massimiliano Todisco、Md Sahidullah、Héctor Delgado、Xin Wang、Nicholas Evans、Tomi Kinnunen、Kong Aik Lee、Ville Vestman及Andreas Nautsch于2019年组织。 ### 支持任务与排行榜 [待补充] ### 语言英语 ## 数据集结构 ### 数据实例 {'speaker_id': 'LA_0091', 'audio_file_name': 'LA_T_8529430', 'audio': {'path': 'D:/Users/80304531/.cache/huggingface/datasets/downloads/extracted/8cabb6d5c283b0ed94b2219a8d459fea8e972ce098ef14d8e5a97b181f8529430/LA/ASVspoof2019_LA_train/flac/LA_T_8529430.flac', 'array': array([-0.00201416, -0.00234985, -0.0022583 , ..., 0.01309204, 0.01339722, 0.01461792], dtype=float32), 'sampling_rate': 16000}, 'system_id': 'A01', 'key': 1} ### 数据字段 #### 逻辑访问（LA）子集： - `speaker_id`: 格式为`LA_****`，为4位说话人标识符 - `audio_file_name`: 音频文件名 - `audio`: 包含音频文件路径、解码后的音频数组及采样率的字典。请注意，当访问音频列时：`dataset[0]["audio"]`会自动对音频文件进行解码并重采样至`dataset.features["audio"].sampling_rate`指定的采样率。解码与重采样大量音频文件可能耗费较长时间，因此建议优先通过样本索引访问音频列，即**始终优先使用`dataset[0]["audio"]`而非`dataset["audio"][0]`**。 - `system_id`: 语音欺骗系统的标识符（取值范围为A01至A19）；对于真实语音样本，该字段留空（值为`'-'`） - `key`: 真实语音样本标注为`bonafide`，欺骗语音样本标注为`spoof` #### 物理访问（PA）子集： - `speaker_id`: 格式为`PA_****`，为4位说话人标识符 - `audio_file_name`: 音频文件名 - `audio`: 包含音频文件路径、解码后的音频数组及采样率的字典。请注意，当访问音频列时：`dataset[0]["audio"]`会自动对音频文件进行解码并重采样至`dataset.features["audio"].sampling_rate`指定的采样率。解码与重采样大量音频文件可能耗费较长时间，因此建议优先通过样本索引访问音频列，即**始终优先使用`dataset[0]["audio"]`而非`dataset["audio"][0]`**。 - `environment_id`: 三元组`(S,R,D_s)`，取值为集合`{a,b,c}`中的单字符类别值，定义如下： | | a | b | c | | -------------------------------- | ------ | ------- | -------- | | S: 房间面积（平方米） | 2-5 | 5-10 | 10-20 | | R: 混响时间T60（毫秒） | 50-200 | 200-600 | 600-1000 | | D_s: 说话人与自动语音验证系统的距离（厘米） | 10-50 | 50-100 | 100-150 | - `attack_id`: 二元组`(D_a,Q)`，取值为集合`{A,B,C}`中的单字符类别值，定义如下： | | A | B | C | | ----------------------------------- | ------- | ------ | ----- | | Z: 攻击者与说话人的距离（厘米） | 10-50 | 50-100 | > 100 | | Q: 回放设备质量 | 完美 | 高 | 低 | 对于真实语音样本，`attack_id`字段留空（值为`'-'`） - `key`: 真实语音样本标注为`bonafide`，欺骗语音样本标注为`spoof` ### 数据划分 | | 训练集 | 开发集 | 评估集 | | -------- | ------ | ------ | ------ | | 真实语音 | 2580 | 2548 | 7355 | | 欺骗语音 | 22800 | 22296 | 63882 | | 总计 | 25380 | 24844 | 71237 | ## 数据集构建 ### 数据集构建初衷 [待补充] ### 源数据 #### 初始数据收集与归一化 [待补充] #### 源语言生成者是谁？ [待补充] ### 注释信息 #### 注释流程 [待补充] #### 注释人员是谁？ [待补充] ### 个人与敏感信息 [待补充] ## 数据集使用注意事项 ### 数据集的社会影响 [待补充] ### 偏差讨论 [待补充] ### 其他已知局限 [待补充] ## 附加信息 ### 数据集维护者 [待补充] ### 许可信息本ASVspoof 2019数据集采用开放数据 Commons 署名许可（odc-by）发布，详情见：http://opendatacommons.org/licenses/by/1.0/ ### 引用信息 @InProceedings{Todisco2019, Title = {{ASV}spoof 2019: {F}uture {H}orizons in {S}poofed and {F}ake {A}udio {D}etection}, Author = {Todisco, Massimiliano and Wang, Xin and Sahidullah, Md and Delgado, H ́ector and Nautsch, Andreas and Yamagishi, Junichi and Evans, Nicholas and Kinnunen, Tomi and Lee, Kong Aik}, booktitle = {Proc. of Interspeech 2019}, Year = {2019} }

提供机构：

LanceaKing

原始信息汇总

数据集概述

数据集摘要

该数据集用于第三届自动说话人验证欺骗与对策挑战赛（ASVspoof 2019），由Junichi Yamagishi等人组织于2019年。

支持的任务和排行榜

[需要更多信息]

语言

英语

数据集结构

数据实例

json { "speaker_id": "LA_0091", "audio_file_name": "LA_T_8529430", "audio": { "path": "D:/Users/80304531/.cache/huggingface/datasets/downloads/extracted/8cabb6d5c283b0ed94b2219a8d459fea8e972ce098ef14d8e5a97b181f850502/LA/ASVspoof2019_LA_train/flac/LA_T_8529430.flac", "array": array([-0.00201416, -0.00234985, -0.0022583 , ..., 0.01309204, 0.01339722, 0.01461792], dtype=float32), "sampling_rate": 16000 }, "system_id": "A01", "key": 1 }

数据字段

逻辑访问（LA）：

speaker_id: LA_****，4位说话人ID
audio_file_name: 音频文件名
audio: 包含下载音频文件路径、解码音频数组和采样率的字典
system_id: 语音欺骗系统ID（A01 - A19），或真实语音的SYSTEM-ID为空（-）
key: bonafide表示真实语音，spoof表示欺骗语音

物理访问（PA）：

speaker_id: PA_****，4位说话人ID
audio_file_name: 音频文件名
audio: 包含下载音频文件路径、解码音频数组和采样率的字典
environment_id: 三元组（S,R,D_s），取集合{a,b,c}中的一个字母作为分类值
attack_id: 二元组（D_a,Q），取集合{A,B,C}中的一个字母作为分类值
key: bonafide表示真实语音，spoof表示欺骗语音

数据分割

	训练集	开发集	评估集
真实语音	2580	2548	7355
欺骗语音	22800	22296	63882
总计	25380	24844	71237

数据集创建

策划理由

[需要更多信息]

源数据

初始数据收集和规范化

[需要更多信息]

源语言生产者

[需要更多信息]

标注

标注过程

[需要更多信息]

标注者

[需要更多信息]

个人和敏感信息

[需要更多信息]

使用数据的注意事项

数据集的社会影响

[需要更多信息]

偏见的讨论

[需要更多信息]

其他已知限制

[需要更多信息]

附加信息

数据集策展人

[需要更多信息]

许可信息

该ASVspoof 2019数据集在Open Data Commons Attribution License下发布：http://opendatacommons.org/licenses/by/1.0/

引用信息

@InProceedings{Todisco2019, Title = {{ASV}spoof 2019: {F}uture {H}orizons in {S}poofed and {F}ake {A}udio {D}etection}, Author = {Todisco, Massimiliano and Wang, Xin and Sahidullah, Md and Delgado, H ́ector and Nautsch, Andreas and Yamagishi, Junichi and Evans, Nicholas and Kinnunen, Tomi and Lee, Kong Aik}, booktitle = {Proc. of Interspeech 2019}, Year = {2019} }

搜集汇总

数据集介绍

构建方式

该数据集是为第三届自动说话人验证欺骗与反制挑战赛（ASVspoof 2019）构建的，由多个研究机构合作完成。数据集包含逻辑访问（LA）和物理访问（PA）两种模式，分别涵盖了不同的音频特征和环境参数。LA模式主要关注说话人ID和系统ID，而PA模式则引入了环境ID和攻击ID，以模拟不同的录音环境和攻击手段。数据集的构建旨在提供一个全面的测试平台，用于评估和提升语音反欺骗技术。

特点

ASVspoof 2019数据集的显著特点在于其多维度的数据结构和丰富的环境模拟。数据集不仅包含了大量的真实语音样本，还涵盖了多种欺骗语音样本，涵盖了从低质量到高质量的多种攻击手段。此外，数据集还提供了详细的环境参数和攻击参数，使得研究者能够更精确地模拟和分析不同的语音欺骗场景。

使用方法

使用该数据集时，研究者可以通过访问数据实例中的音频文件和相关元数据，进行语音分类和反欺骗技术的开发与评估。数据集提供了训练集、开发集和评估集，分别用于模型训练、调优和最终评估。建议在使用音频数据时，优先访问样本索引以提高处理效率。此外，数据集的许可证为Open Data Commons Attribution License，使用时需遵守相关条款。

背景与挑战

背景概述

ASVspoof 2019数据集是由Junichi Yamagishi、Massimiliano Todisco等研究人员于2019年创建，旨在支持第三届自动说话人验证欺骗与反制措施挑战赛。该数据集的核心研究问题集中在语音反欺骗技术，即识别和区分真实语音与合成或伪造语音。通过提供大量标注的音频数据，ASVspoof 2019数据集为语音识别和安全领域的研究提供了宝贵的资源，推动了相关技术的进步与应用。

当前挑战

ASVspoof 2019数据集在构建过程中面临多重挑战。首先，数据集需涵盖多种语音合成和伪造技术，以确保模型的泛化能力。其次，标注过程需精确区分真实语音与各种伪造语音，这对标注质量和一致性提出了高要求。此外，数据集的规模和多样性也带来了存储和处理上的挑战，特别是在音频文件的解码和重采样过程中，处理大量数据可能耗费大量时间和计算资源。

常用场景

经典使用场景

在语音反欺骗领域，LanceaKing/asvspoof2019数据集被广泛用于训练和评估自动说话人验证系统的抗欺骗能力。该数据集包含了大量的真实语音和各种欺骗语音样本，使得研究人员能够开发和测试新的反欺骗算法。通过分析这些样本，研究者可以识别和区分真实语音与合成、重放或转换的欺骗语音，从而提高系统的安全性。

实际应用

在实际应用中，LanceaKing/asvspoof2019数据集被用于改进和验证语音识别系统的安全性。例如，在金融交易、门禁系统和远程身份验证等场景中，使用该数据集训练的模型能够有效识别和防御各种语音欺骗攻击，确保系统的可靠性和用户的安全。此外，该数据集还促进了语音识别技术在安全敏感领域的广泛应用。

衍生相关工作

基于LanceaKing/asvspoof2019数据集，许多研究工作得以展开，包括但不限于深度学习模型的改进、特征提取方法的创新以及多模态融合技术的应用。例如，一些研究通过引入更复杂的神经网络结构来提高欺骗检测的准确性，而另一些研究则探索了结合音频和文本信息的多模态方法。这些工作不仅丰富了语音反欺骗领域的研究内容，也为实际应用提供了更多可能性。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集