five

xincan/Llama-VITS_data

收藏
Hugging Face2024-05-10 更新2024-05-25 收录
下载链接:
https://hf-mirror.com/datasets/xincan/Llama-VITS_data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit dataset_info: features: - name: version dtype: string - name: data list: - name: a dtype: int64 - name: b dtype: float64 - name: c dtype: string - name: d dtype: bool splits: - name: train num_bytes: 58 num_examples: 1 download_size: 2749 dataset_size: 58 configs: - config_name: default data_files: - split: train path: data/train-* task_categories: - text-to-speech language: - en --- # Dataset Card for Llama-VITS_data The dataset repository contains data related with our work "Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness", encapsulating: - Filtered dataset `EmoV_DB_bea_sem` - Filelists with semantic embeddings - Model checkpoints - Human evaluation templates ## Dataset Details - **Paper:** Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness - **Curated by:** Xincan Feng, Akifumi Yoshimoto - **Funded by:** CyberAgent Inc - **Repository:** https://github.com/xincanfeng/vitsGPT - **Demo:** https://xincanfeng.github.io/Llama-VITS_demo/ ## Dataset Creation We fileterd `EmoV_DB_bea_sem` dataset from `EmoV_DB` (Adigwe et al., 2018), a database of emotional speech containing data for male and female actors in English and French. EmoV_DB covers 5 emotion classes, amused, angry, disgusted, neutral, and sleepy. To factor out the effect of different speakers, we filtered the original EmoV_DB dataset into the speech of a specific female English speaker, bea. Then we use Llama2 to predict the emotion label of the transcript chosen from the above 5 emotion classes, and select the audio samples which has the same predicted emotion. The filtered dataset contains 22.8-minute records for training. We named the filtered dataset `EmoV_DB_bea_sem` and investigated how the semantic embeddings from Llama2 behave in naturalness and expressiveness on it. Please refer to our paper for more information. ## Citation If our work is useful to you, please cite our paper: "Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness". ```sh @misc{feng2024llamavits, title={Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness}, author={Xincan Feng and Akifumi Yoshimoto}, year={2024}, eprint={2404.06714}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```
提供机构:
xincan
原始信息汇总

数据集概述

数据集基本信息

  • 许可证: MIT
  • 数据集大小:
    • 下载大小: 2749字节
    • 数据集大小: 58字节

数据集特征

  • 版本 (version): 字符串类型
  • 数据 (data):
    • a: 整数类型 (int64)
    • b: 浮点类型 (float64)
    • c: 字符串类型
    • d: 布尔类型 (bool)

数据集分割

  • 训练集 (train):
    • 字节数: 58
    • 示例数: 1

任务类别

  • 文本到语音 (text-to-speech)

语言

  • 英语 (en)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作