five

realnetworks-kontxt/fleurs-hs-vits

收藏
Hugging Face2024-12-19 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/realnetworks-kontxt/fleurs-hs-vits
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - audio-classification language: - de - en - es - fr - it - nl - pl - sv tags: - speech - speech-classifiation - text-to-speech - spoofing - multilingualism pretty_name: FLEURS-HS VITS size_categories: - 10K<n<100K --- # FLEURS-HS VITS An extension of the [FLEURS](https://huggingface.co/datasets/google/fleurs) dataset for synthetic speech detection using text-to-speech, featured in the paper **Synthetic speech detection with Wav2Vec 2.0 in various language settings**. This dataset is 1 of 3 used in the paper, the others being: - [FLEURS-HS](https://huggingface.co/datasets/realnetworks-kontxt/fleurs-hs) - the default train, dev and test sets - separated due to different licensing - [ARCTIC-HS](https://huggingface.co/datasets/realnetworks-kontxt/arctic-hs) - extension of the [CMU_ARCTIC](http://festvox.org/cmu_arctic/) and [L2-ARCTIC](https://psi.engr.tamu.edu/l2-arctic-corpus/) sets in a similar manner ## Dataset Details ### Dataset Description The dataset features 8 languages originally seen in FLEURS: - German - English - Spanish - French - Italian - Dutch - Polish - Swedish The `synthetic` samples are generated using: - [Google Cloud Text-To-Speech](https://cloud.google.com/text-to-speech) - [Azure Text-To-Speech](https://azure.microsoft.com/en-us/products/ai-services/text-to-speech) - [Amazon Polly](https://aws.amazon.com/polly/) Only the test VITS samples are provided. For every VITS voice, which is in practice specific model weights, one sample per transcript is provided. - **Curated by:** [KONTXT by RealNetworks](https://realnetworks.com/kontxt) - **Funded by:** [RealNetworks](https://realnetworks.com/) - **Language(s) (NLP):** English, German, Spanish, French, Italian, Dutch, Polish, Swedish - **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) for the code, [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) for the dataset (but various licenses depending on the source for VITS samples) ### Dataset Sources The original FLEURS dataset was downloaded from [HuggingFace](https://huggingface.co/datasets/google/fleurs). - **FLEURS Repository:** [HuggingFace](https://huggingface.co/datasets/google/fleurs) - **FLEURS Paper:** [arXiv](https://arxiv.org/abs/2205.12446) - **Paper:** Synthetic speech detection with Wav2Vec 2.0 in various language settings ## Uses This dataset is best used as a difficult test set. Each sample contains an `Audio` feature, and a label, which is always `synthetic`; this dataset does not include any human samples. ### Direct Use The following snippet of code demonstrates loading the training split for English: ```python from datasets import load_dataset fleurs_hs = load_dataset( "realnetworks-kontxt/fleurs-hs-vits", "en_us", split="test", trust_remote_code=True, ) ``` To load a different language, change `en_us` into one of the following: - `de_de` for German - `es_419` for Spanish - `fr_fr` for French - `it_it` for Italian - `nl_nl` for Dutch - `pl_pl` for Polish - `sv_se` for Swedish This dataset only has a `test` split. The `trust_remote_code=True` parameter is necessary because this dataset uses a custom loader. To check out which code is being ran, check out the [loading script](./fleurs-hs-vits.py). ## Dataset Structure The dataset data is contained in the [data directory](https://huggingface.co/datasets/realnetworks-kontxt/fleurs-hs-vits/tree/main/data). There exists 1 directory per language. Within that directory, there is a directory named `splits`; it contains 1 file per split: - `test.tar.gz` That `.tar.gz` file contains 1 or more directories, named after the VITS model being used: ex. `thorsten-vits` Each of these directories contain `.wav` files. Each `.wav` file is named after the ID of its transcript. Keep in mind that these directories can't be merged as they share their file names. An identical file name implies a speaker-voice pair, ex. `human/123.wav` and `thorsten-vits/123.wav`. Finally, back to the language directory, it contains 3 metadata files, which are not used in the loaded dataset, but might be useful to researchers: - `recording-metadata.csv` - contains the transcript ID, file name, split and gender of the original FLEURS samples - `recording-transcripts.csv` - contains the transcrpits of the original FLEURS samples - `voice-metadata.csv` - contains the groupation of TTS' used alongside the splits they were used for ### Sample A sample contains contains an Audio feature `audio`, and a string `label`. ``` { 'audio': { 'path': 'ljspeech-vits/1660.wav', 'array': array([0.00119019, 0.00109863, 0.00106812, ..., 0., 0., 0.]), 'sampling_rate': 16000 }, 'label': 'synthetic' } ``` ## Citation The dataset is featured alongside our paper, **Synthetic speech detection with Wav2Vec 2.0 in various language settings**, which will be published on IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW). We'll provide links once it's available online. **BibTeX:** If you use this work, please cite us by including the following BibTeX reference: ``` @inproceedings{dropuljic-ssdww2v2ivls, author={Dropuljić, Branimir and Šuflaj, Miljenko and Jertec, Andrej and Obadić, Leo}, booktitle={{IEEE} International Conference on Acoustics, Speech, and Signal Processing, {ICASSP} 2024 - Workshops, Seoul, Republic of Korea, April 14-19, 2024}, title={Synthetic Speech Detection with Wav2vec 2.0 in Various Language Settings}, year={2024}, month={04}, pages={585-589}, publisher={{IEEE}}, volume={}, number={}, keywords={Synthetic speech detection;text-to-speech;wav2vec 2.0;spoofing attack;multilingualism}, url={https://doi.org/10.1109/ICASSPW62465.2024.10627750}, doi={10.1109/ICASSPW62465.2024.10627750} } ``` ## Dataset Card Authors - [Miljenko Šuflaj](https://huggingface.co/suflaj) ## Dataset Card Contact - [Miljenko Šuflaj](mailto:msuflaj@realnetworks.com)
提供机构:
realnetworks-kontxt
原始信息汇总

FLEURS-HS VITS 数据集概述

数据集基本信息

  • 名称: FLEURS-HS VITS
  • 任务类别: 音频分类
  • 语言: 德语, 英语, 西班牙语, 法语, 意大利语, 荷兰语, 波兰语, 瑞典语
  • 标签: 语音, 语音分类, 文本到语音, 欺骗, 多语言
  • 数据集大小: 10K<n<100K
  • 许可证: 数据集为CC BY 4.0,代码为Apache 2.0

数据集描述

  • 内容: 该数据集是FLEURS数据集的扩展,用于合成语音检测,包含8种语言的合成语音样本。
  • 合成语音生成工具: Google Cloud Text-To-Speech, Azure Text-To-Speech, Amazon Polly
  • 数据集构成: 仅包含测试样本,每个VITS声音模型对应一个样本。

数据集结构

  • 目录结构: 每个语言一个目录,内部包含splits目录,存放test.tar.gz文件,该文件包含多个以VITS模型命名的目录,每个目录下是.wav文件。
  • 元数据文件: 包含recording-metadata.csv, recording-transcripts.csv, voice-metadata.csv,提供原始样本的元数据信息。

使用方法

  • 加载示例: 使用datasets.load_dataset函数加载特定语言的测试集。
  • 数据集分割: 仅有一个test分割。

引用信息

  • 相关论文: Synthetic speech detection with Wav2Vec 2.0 in various language settings
  • 会议: IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
  • BibTeX: 待更新

数据集联系人

  • 作者: Miljenko Šuflaj
  • 联系方式: msuflaj@realnetworks.com
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作