realnetworks-kontxt/fleurs-hs-vits

Name: realnetworks-kontxt/fleurs-hs-vits
Creator: realnetworks-kontxt
Published: 2024-12-19 09:57:21
License: 暂无描述

Hugging Face2024-12-19 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/realnetworks-kontxt/fleurs-hs-vits

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - audio-classification language: - de - en - es - fr - it - nl - pl - sv tags: - speech - speech-classifiation - text-to-speech - spoofing - multilingualism pretty_name: FLEURS-HS VITS size_categories: - 10K<n<100K --- # FLEURS-HS VITS An extension of the [FLEURS](https://huggingface.co/datasets/google/fleurs) dataset for synthetic speech detection using text-to-speech, featured in the paper **Synthetic speech detection with Wav2Vec 2.0 in various language settings**. This dataset is 1 of 3 used in the paper, the others being: - [FLEURS-HS](https://huggingface.co/datasets/realnetworks-kontxt/fleurs-hs) - the default train, dev and test sets - separated due to different licensing - [ARCTIC-HS](https://huggingface.co/datasets/realnetworks-kontxt/arctic-hs) - extension of the [CMU_ARCTIC](http://festvox.org/cmu_arctic/) and [L2-ARCTIC](https://psi.engr.tamu.edu/l2-arctic-corpus/) sets in a similar manner ## Dataset Details ### Dataset Description The dataset features 8 languages originally seen in FLEURS: - German - English - Spanish - French - Italian - Dutch - Polish - Swedish The `synthetic` samples are generated using: - [Google Cloud Text-To-Speech](https://cloud.google.com/text-to-speech) - [Azure Text-To-Speech](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) - [Amazon Polly](https://aws.amazon.com/polly/) Only the test VITS samples are provided. For every VITS voice, which is in practice specific model weights, one sample per transcript is provided. - **Curated by:** [KONTXT by RealNetworks](https://realnetworks.com/kontxt) - **Funded by:** [RealNetworks](https://realnetworks.com/) - **Language(s) (NLP):** English, German, Spanish, French, Italian, Dutch, Polish, Swedish - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) for the code, [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) for the dataset (but various licenses depending on the source for VITS samples) ### Dataset Sources The original FLEURS dataset was downloaded from [HuggingFace](https://huggingface.co/datasets/google/fleurs). - **FLEURS Repository:** [HuggingFace](https://huggingface.co/datasets/google/fleurs) - **FLEURS Paper:** [arXiv](https://arxiv.org/abs/2205.12446) - **Paper:** Synthetic speech detection with Wav2Vec 2.0 in various language settings ## Uses This dataset is best used as a difficult test set. Each sample contains an `Audio` feature, and a label, which is always `synthetic`; this dataset does not include any human samples. ### Direct Use The following snippet of code demonstrates loading the training split for English: ```python from datasets import load_dataset fleurs_hs = load_dataset( "realnetworks-kontxt/fleurs-hs-vits", "en_us", split="test", trust_remote_code=True, ) ``` To load a different language, change `en_us` into one of the following: - `de_de` for German - `es_419` for Spanish - `fr_fr` for French - `it_it` for Italian - `nl_nl` for Dutch - `pl_pl` for Polish - `sv_se` for Swedish This dataset only has a `test` split. The `trust_remote_code=True` parameter is necessary because this dataset uses a custom loader. To check out which code is being ran, check out the [loading script](./fleurs-hs-vits.py). ## Dataset Structure The dataset data is contained in the [data directory](https://huggingface.co/datasets/realnetworks-kontxt/fleurs-hs-vits/tree/main/data). There exists 1 directory per language. Within that directory, there is a directory named `splits`; it contains 1 file per split: - `test.tar.gz` That `.tar.gz` file contains 1 or more directories, named after the VITS model being used: ex. `thorsten-vits` Each of these directories contain `.wav` files. Each `.wav` file is named after the ID of its transcript. Keep in mind that these directories can't be merged as they share their file names. An identical file name implies a speaker-voice pair, ex. `human/123.wav` and `thorsten-vits/123.wav`. Finally, back to the language directory, it contains 3 metadata files, which are not used in the loaded dataset, but might be useful to researchers: - `recording-metadata.csv` - contains the transcript ID, file name, split and gender of the original FLEURS samples - `recording-transcripts.csv` - contains the transcrpits of the original FLEURS samples - `voice-metadata.csv` - contains the groupation of TTS' used alongside the splits they were used for ### Sample A sample contains contains an Audio feature `audio`, and a string `label`. ``` { 'audio': { 'path': 'ljspeech-vits/1660.wav', 'array': array([0.00119019, 0.00109863, 0.00106812, ..., 0., 0., 0.]), 'sampling_rate': 16000 }, 'label': 'synthetic' } ``` ## Citation The dataset is featured alongside our paper, **Synthetic speech detection with Wav2Vec 2.0 in various language settings**, which will be published on IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). We'll provide links once it's available online. **BibTeX:** If you use this work, please cite us by including the following BibTeX reference: ``` @inproceedings{dropuljic-ssdww2v2ivls, author={Dropuljić, Branimir and Šuflaj, Miljenko and Jertec, Andrej and Obadić, Leo}, booktitle={{IEEE} International Conference on Acoustics, Speech, and Signal Processing, {ICASSP} 2024 - Workshops, Seoul, Republic of Korea, April 14-19, 2024}, title={Synthetic Speech Detection with Wav2vec 2.0 in Various Language Settings}, year={2024}, month={04}, pages={585-589}, publisher={{IEEE}}, volume={}, number={}, keywords={Synthetic speech detection;text-to-speech;wav2vec 2.0;spoofing attack;multilingualism}, url={https://doi.org/10.1109/ICASSPW62465.2024.10627750}, doi={10.1109/ICASSPW62465.2024.10627750} } ``` ## Dataset Card Authors - [Miljenko Šuflaj](https://huggingface.co/suflaj) ## Dataset Card Contact - [Miljenko Šuflaj](mailto:msuflaj@realnetworks.com)

提供机构：

realnetworks-kontxt

原始信息汇总

FLEURS-HS VITS 数据集概述

数据集基本信息

名称: FLEURS-HS VITS
任务类别: 音频分类
语言: 德语, 英语, 西班牙语, 法语, 意大利语, 荷兰语, 波兰语, 瑞典语
标签: 语音, 语音分类, 文本到语音, 欺骗, 多语言
数据集大小: 10K<n<100K
许可证: 数据集为CC BY 4.0，代码为Apache 2.0

数据集描述

内容: 该数据集是FLEURS数据集的扩展，用于合成语音检测，包含8种语言的合成语音样本。
合成语音生成工具: Google Cloud Text-To-Speech, Azure Text-To-Speech, Amazon Polly
数据集构成: 仅包含测试样本，每个VITS声音模型对应一个样本。

数据集结构

目录结构: 每个语言一个目录，内部包含splits目录，存放test.tar.gz文件，该文件包含多个以VITS模型命名的目录，每个目录下是.wav文件。
元数据文件: 包含recording-metadata.csv, recording-transcripts.csv, voice-metadata.csv，提供原始样本的元数据信息。

使用方法

加载示例: 使用datasets.load_dataset函数加载特定语言的测试集。
数据集分割: 仅有一个test分割。

引用信息

相关论文: Synthetic speech detection with Wav2Vec 2.0 in various language settings
会议: IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
BibTeX: 待更新

数据集联系人

作者: Miljenko Šuflaj
联系方式: msuflaj@realnetworks.com

5,000+

优质数据集

54 个

任务类型

进入经典数据集