xincan/Llama-VITS_data

Name: xincan/Llama-VITS_data
Creator: xincan
Published: 2024-05-10 10:20:47
License: 暂无描述

Hugging Face2024-05-10 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/xincan/Llama-VITS_data

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit dataset_info: features: - name: version dtype: string - name: data list: - name: a dtype: int64 - name: b dtype: float64 - name: c dtype: string - name: d dtype: bool splits: - name: train num_bytes: 58 num_examples: 1 download_size: 2749 dataset_size: 58 configs: - config_name: default data_files: - split: train path: data/train-* task_categories: - text-to-speech language: - en --- # Dataset Card for Llama-VITS_data The dataset repository contains data related with our work "Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness", encapsulating: - Filtered dataset `EmoV_DB_bea_sem` - Filelists with semantic embeddings - Model checkpoints - Human evaluation templates ## Dataset Details - **Paper:** Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness - **Curated by:** Xincan Feng, Akifumi Yoshimoto - **Funded by:** CyberAgent Inc - **Repository:** https://github.com/xincanfeng/vitsGPT - **Demo:** https://xincanfeng.github.io/Llama-VITS_demo/ ## Dataset Creation We fileterd `EmoV_DB_bea_sem` dataset from `EmoV_DB` (Adigwe et al., 2018), a database of emotional speech containing data for male and female actors in English and French. EmoV_DB covers 5 emotion classes, amused, angry, disgusted, neutral, and sleepy. To factor out the effect of different speakers, we filtered the original EmoV_DB dataset into the speech of a specific female English speaker, bea. Then we use Llama2 to predict the emotion label of the transcript chosen from the above 5 emotion classes, and select the audio samples which has the same predicted emotion. The filtered dataset contains 22.8-minute records for training. We named the filtered dataset `EmoV_DB_bea_sem` and investigated how the semantic embeddings from Llama2 behave in naturalness and expressiveness on it. Please refer to our paper for more information. ## Citation If our work is useful to you, please cite our paper: "Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness". ```sh @misc{feng2024llamavits, title={Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness}, author={Xincan Feng and Akifumi Yoshimoto}, year={2024}, eprint={2404.06714}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

提供机构：

xincan

原始信息汇总

数据集概述

数据集基本信息

许可证: MIT
数据集大小:
- 下载大小: 2749字节
- 数据集大小: 58字节

数据集特征

版本 (version): 字符串类型
数据 (data):
- a: 整数类型 (int64)
- b: 浮点类型 (float64)
- c: 字符串类型
- d: 布尔类型 (bool)

数据集分割

训练集 (train):
- 字节数: 58
- 示例数: 1

任务类别

文本到语音 (text-to-speech)

语言

英语 (en)

5,000+

优质数据集

54 个

任务类型

进入经典数据集