Name: FDlalala/test
Creator: FDlalala
Published: 2025-12-11 09:39:26
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/FDlalala/test

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-nc-4.0 task_categories: - audio-classification - automatic-speech-recognition language: - en tags: - conversational - emotions - dialogues - conversations pretty_name: Deep Dialogue (XTTS-v2) size_categories: - 100K<n<1M dataset_info: - config_name: all features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 208162364 num_examples: 39055 download_size: 0 dataset_size: 208162364 - config_name: default features: - name: conversation_id dtype: string - name: model_dir dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: turn_index dtype: int64 - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio_path dtype: string - name: segment_audio_path dtype: string - name: audio_segment_id dtype: float64 - name: audio_model dtype: string - name: audio_actor dtype: string - name: audio_original_text dtype: string - name: audio_substituted_text dtype: string - name: audio_cleaned_text dtype: string - name: audio_dialogue_emotion dtype: string - name: audio_ravdess_emotion dtype: string - name: audio_ref_audio dtype: string - name: audio_ref_transcript dtype: string - name: audio_start_time dtype: float64 - name: audio_end_time dtype: float64 - name: audio_duration dtype: float64 - name: audio_following_silence dtype: float64 - name: audio_generation_time dtype: float64 - name: audio_realtime_factor dtype: float64 splits: - name: train num_bytes: 261914837 num_examples: 243295 download_size: 80172060 dataset_size: 261914837 - config_name: dialogues_cohere7B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 2197335 num_examples: 673 download_size: 0 dataset_size: 2197335 - config_name: dialogues_cohere7B_gemma3-4B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 4659388 num_examples: 1144 download_size: 0 dataset_size: 4659388 - config_name: dialogues_gemma3-27B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 17243981 num_examples: 3218 download_size: 0 dataset_size: 17243981 - config_name: dialogues_gemma3-4B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 5194294 num_examples: 1206 download_size: 0 dataset_size: 5194294 - config_name: dialogues_llama3-70B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 20470475 num_examples: 3636 download_size: 0 dataset_size: 20470475 - config_name: dialogues_llama3-70B_qwen2.5-72B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 21600583 num_examples: 3791 download_size: 0 dataset_size: 21600583 - config_name: dialogues_llama3-8B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 7223668 num_examples: 1582 download_size: 0 dataset_size: 7223668 - config_name: dialogues_llama3-8B_cohere7B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 7466351 num_examples: 1644 download_size: 0 dataset_size: 7466351 - config_name: dialogues_llama3-8B_gemma3-4B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 9629300 num_examples: 1986 download_size: 0 dataset_size: 9629300 - config_name: dialogues_phi4-14B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 7642499 num_examples: 1622 download_size: 0 dataset_size: 7642499 - config_name: dialogues_phi4-14B_gemma3-27B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 19799165 num_examples: 3455 download_size: 0 dataset_size: 19799165 - config_name: dialogues_phi4-mini features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 3764587 num_examples: 1014 download_size: 0 dataset_size: 3764587 - config_name: dialogues_qwen2.5-32B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 18944118 num_examples: 3294 download_size: 0 dataset_size: 18944118 - config_name: dialogues_qwen2.5-32B_gemma3-27B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 18619665 num_examples: 3493 download_size: 0 dataset_size: 18619665 - config_name: dialogues_qwen2.5-32B_phi4-14B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 21506478 num_examples: 3529 download_size: 0 dataset_size: 21506478 - config_name: dialogues_qwen2.5-72B features: - name: id dtype: string - name: domain dtype: string - name: timestamp dtype: string - name: model1 dtype: string - name: model2 dtype: string - name: configuration dtype: string - name: conversation sequence: - name: speaker dtype: string - name: text dtype: string - name: emotion dtype: string - name: full_audio dtype: audio - name: segments sequence: audio - name: segment_metadata sequence: - name: segment_id dtype: string - name: filename dtype: string - name: speaker dtype: string - name: model dtype: string - name: actor dtype: string - name: original_text dtype: string - name: substituted_text dtype: string - name: cleaned_text dtype: string - name: dialogue_emotion dtype: string - name: ravdess_emotion dtype: string - name: ref_audio dtype: string - name: ref_transcript dtype: string - name: start_time dtype: float32 - name: end_time dtype: float32 - name: duration dtype: float32 - name: following_silence dtype: float32 - name: generation_time dtype: float32 - name: realtime_factor dtype: float32 splits: - name: train num_bytes: 22200484 num_examples: 3768 download_size: 0 dataset_size: 22200484 configs: - config_name: default data_files: - split: train path: data/train-* --- # DeepDialogue-xtts **DeepDialogue-xtts** is a large-scale multimodal dataset containing 40,150 high-quality multi-turn dialogues spanning 41 domains and incorporating 20 distinct emotions with coherent emotional progressions. This repository contains the XTTS-v2 variant of the dataset, where speech is generated using [XTTS-v2](https://huggingface.co/coqui/XTTS-v2) with explicit emotional conditioning. [![paper](https://img.shields.io/badge/Paper-arXiv-green)](https://arxiv.org/abs/2505.19978) ## 🚨 Important This dataset is large (~180GB) due to the inclusion of high-quality audio files. When cloning the repository, ensure you have sufficient disk space and a stable internet connection. ## 💬 Dataset Overview DeepDialogue pairs 9 different language models (4B-72B parameters) to generate dialogues with emotionally coherent trajectories. Each conversation includes: - Multi-turn dialogues (3-10 turns) between two AI agents - Domain-specific content across 41 topics - Emotional annotations for each utterance (20 distinct emotions) - High-quality synthesized speech with explicit emotion conditioning - Paired audio-text data suitable for speech and dialogue research ### Emotional Speech Generation The XTTS-v2 variant uses reference audio samples from the [RAVDESS dataset](https://zenodo.org/records/1188976) to explicitly condition the speech synthesis on specific emotions, creating natural-sounding emotional expressions in the spoken dialogues. ## 📦 Installation The dataset contains large audio files and uses Git LFS. To properly clone the repository: ```bash # Install Git LFS if you haven't already git lfs install # Clone the repository (be prepared for a large download, ~180GB) git clone https://huggingface.co/datasets/SALT-Research/DeepDialogue-xtts cd DeepDialogue-xtts ``` You can also access specific files through the Hugging Face web interface if you don't need the entire dataset. ## 🗂️ Dataset Structure The dataset is organized as follows: ``` data/ ├── dialogues_[model_combination]/ # Folders grouped by model pairs │ ├── [dialogue_id].json # JSON representation of the dialogue │ └── [dialogue_id]/ # Folder containing audio files │ ├── [dialogue_id]_full.wav # Complete dialogue audio │ ├── metadata.tsv # Metadata for the audio segments │ └── segments/ # Individual utterance audio files │ └── [segment_id]_[speaker]_[emotion].wav └── train-00000-of-00001.parquet # Metadata for all dialogues ``` ### Model Combinations The dataset includes dialogues from the following model combinations: ``` dialogues_cohere7B dialogues_llama3-70B_qwen2.5-72B dialogues_phi4-14B_gemma3-27B dialogues_cohere7B_gemma3-4B dialogues_llama3-8B dialogues_phi4-mini dialogues_gemma3-27B dialogues_llama3-8B_cohere7B dialogues_qwen2.5-32B dialogues_gemma3-4B dialogues_llama3-8B_gemma3-4B dialogues_qwen2.5-32B_gemma3-27B dialogues_llama3-70B dialogues_phi4-14B dialogues_qwen2.5-32B_phi4-14B dialogues_qwen2.5-72B ``` ### Domains The dataset covers 41 distinct domains: ```python topics = [ "art", "books", "cars", "celebrities", "coding", "cooking", "education", "events", "fashion", "finance", "fitness", "food", "gaming", "gardening", "health", "history", "hobbies", "holidays", "home", "languages", "makeup", "movies", "music", "nature", "news", "pets", "philosophy", "photography", "podcasts", "politics", "relationships", "science", "shopping", "social_media", "spirituality", "sports", "technology", "traditions", "travel", "weather", "work" ] ``` ### Metadata Structure The metadata in `metadata.tsv` files includes: | Field | Description | |-------|-------------| | segment_id | Unique identifier for the audio segment | | filename | Filename of the audio segment | | speaker | Speaker identifier (LLM1 or LLM2) | | model | LLM model that generated this turn | | actor | Voice actor ID from RAVDESS | | original_text | Original text generated by the LLM | | substituted_text | Text after any processing | | cleaned_text | Text after cleaning for TTS | | dialogue_emotion | Emotion label from dialogue generation | | ravdess_emotion | Mapped emotion from RAVDESS | | ref_audio | Reference audio file used for emotion conditioning | | ref_transcript | Transcript of the reference audio | | start_time | Start time in the full audio (seconds) | | end_time | End time in the full audio (seconds) | | duration | Duration of the segment (seconds) | | following_silence | Silence after the segment (seconds) | | generation_time | Time taken to generate the audio (seconds) | | realtime_factor | Ratio of audio duration to generation time | ## 📊 Usage Examples ### 1. Load the Full Conversations for a Single Model ```python import os import json import pandas as pd from glob import glob def load_conversations_for_model(model_dir): """Load all conversations for a specific model combination.""" model_path = os.path.join("data", model_dir) if not os.path.exists(model_path): print(f"Model directory {model_dir} not found.") return None conversations = [] # Get all JSON files in the model directory json_files = glob(os.path.join(model_path, "*.json")) for json_file in json_files: with open(json_file, 'r') as f: conversation = json.load(f) # Add model information conversation['model_dir'] = model_dir conversations.append(conversation) print(f"Loaded {len(conversations)} conversations from {model_dir}") return conversations # Example usage: conversations = load_conversations_for_model("dialogues_llama3-70B") ``` ### 2. Load the Full Conversations for a Single Topic/Domain ```python import os import json import pandas as pd from glob import glob def load_conversations_for_domain(domain, base_path="data"): """Load all conversations for a specific domain.""" # First, we'll use the parquet file to find conversations in this domain parquet_path = os.path.join(base_path, "train-00000-of-00001.parquet") if os.path.exists(parquet_path): # Use parquet file for efficient filtering df = pd.read_parquet(parquet_path) domain_convs = df[df['domain'] == domain]['conversation_id'].unique() print(f"Found {len(domain_convs)} conversations in domain '{domain}'") # Load each conversation JSON conversations = [] for conv_id in domain_convs: # Find the model directory for this conversation model_dir = df[df['conversation_id'] == conv_id]['model_dir'].iloc[0] json_path = os.path.join(base_path, model_dir, f"{conv_id}.json") if os.path.exists(json_path): with open(json_path, 'r') as f: conversation = json.load(f) conversations.append(conversation) return conversations else: # Fallback: search through all model directories print("Parquet file not found, searching through all model directories...") all_model_dirs = [d for d in os.listdir(base_path) if d.startswith("dialogues_")] conversations = [] for model_dir in all_model_dirs: model_path = os.path.join(base_path, model_dir) json_files = glob(os.path.join(model_path, "*.json")) for json_file in json_files: with open(json_file, 'r') as f: conv = json.load(f) if conv.get('domain') == domain: # Add model directory information conv['model_dir'] = model_dir conversations.append(conv) print(f"Found {len(conversations)} conversations in domain '{domain}'") return conversations # Example usage: music_conversations = load_conversations_for_domain("music") ``` ### 3. Load All Full Conversations ```python import os import json from glob import glob def load_all_conversations(base_path="data"): """Load all conversations from all model directories.""" # Get all model directories model_dirs = [d for d in os.listdir(base_path) if d.startswith("dialogues_")] all_conversations = [] for model_dir in model_dirs: model_path = os.path.join(base_path, model_dir) json_files = glob(os.path.join(model_path, "*.json")) for json_file in json_files: with open(json_file, 'r') as f: conversation = json.load(f) # Add model information conversation['model_dir'] = model_dir all_conversations.append(conversation) print(f"Loaded {len(all_conversations)} conversations from all model directories") return all_conversations # Example usage: all_conversations = load_all_conversations() ``` ### 4. Load the Segments of a Full Conversation ```python import os import pandas as pd from IPython.display import Audio import matplotlib.pyplot as plt import librosa import librosa.display import numpy as np def load_conversation_segments(conversation_id, model_dir, base_path="data"): """Load all segments of a specific conversation with metadata.""" # Path to the conversation directory conv_dir = os.path.join(base_path, model_dir, conversation_id) if not os.path.exists(conv_dir): print(f"Conversation directory not found: {conv_dir}") return None # Load metadata metadata_path = os.path.join(conv_dir, "metadata.tsv") if os.path.exists(metadata_path): metadata = pd.read_csv(metadata_path, sep='\t') else: print(f"Metadata file not found: {metadata_path}") return None # Path to segments directory segments_dir = os.path.join(conv_dir, "segments") # Full audio path full_audio_path = os.path.join(conv_dir, f"{conversation_id}_full.wav") result = { 'conversation_id': conversation_id, 'model_dir': model_dir, 'metadata': metadata, 'segments_dir': segments_dir, 'full_audio_path': full_audio_path } return result def play_segment(segment_info, index): """Play a specific segment from a conversation.""" if segment_info is None: return metadata = segment_info['metadata'] if index >= len(metadata): print(f"Segment index {index} out of range. Max index: {len(metadata)-1}") return filename = metadata.iloc[index]['filename'] segment_path = os.path.join(segment_info['segments_dir'], filename) if os.path.exists(segment_path): print(f"Playing segment {index+1}/{len(metadata)}: {filename}") print(f"Text: \"{metadata.iloc[index]['cleaned_text']}\"") print(f"Emotion: {metadata.iloc[index]['dialogue_emotion']}") return Audio(segment_path) else: print(f"Segment file not found: {segment_path}") def visualize_segment_waveform(segment_info, index): """Visualize the waveform of a specific segment.""" if segment_info is None: return metadata = segment_info['metadata'] if index >= len(metadata): print(f"Segment index {index} out of range. Max index: {len(metadata)-1}") return filename = metadata.iloc[index]['filename'] segment_path = os.path.join(segment_info['segments_dir'], filename) if os.path.exists(segment_path): # Load the audio file y, sr = librosa.load(segment_path) # Create a figure and plot the waveform plt.figure(figsize=(12, 4)) librosa.display.waveshow(y, sr=sr) plt.title(f"Waveform: {filename} | Emotion: {metadata.iloc[index]['dialogue_emotion']}") plt.xlabel("Time (s)") plt.ylabel("Amplitude") plt.tight_layout() plt.show() else: print(f"Segment file not found: {segment_path}") # Example usage: segment_info = load_conversation_segments("music_85_9", "dialogues_llama3-70B") # Play a specific segment (e.g., the first one) if segment_info: play_segment(segment_info, 0) visualize_segment_waveform(segment_info, 0) # Print all segments in this conversation print("\nAll segments in conversation:") for i, row in segment_info['metadata'].iterrows(): print(f"{i+1}. Speaker: {row['speaker']} | Emotion: {row['dialogue_emotion']} | Text: \"{row['cleaned_text']}\"") ``` ## 🔄 Related Resources - [DeepDialogue-orpheus](https://huggingface.co/datasets/SALT-Research/DeepDialogue-orpheus): The companion dataset using Orpheus TTS instead of XTTS for speech synthesis - [Project Website](https://salt-research.github.io/DeepDialogue): Additional information and resources ### 🔗 Links - **TTS model**: [coqui/XTTS-v2](https://huggingface.co/coqui/XTTS-v2) - **Emotion source**: [RAVDESS Dataset on Zenodo](https://zenodo.org/records/1188976) ## 📜 Citation If you use this dataset in your research, please cite our [paper](https://arxiv.org/abs/2505.19978): ``` @misc{koudounas2025deepdialoguemultiturnemotionallyrichspoken, title={DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset}, author={Alkis Koudounas and Moreno La Quatra and Elena Baralis}, year={2025}, eprint={2505.19978}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.19978}, } ``` ## 📃 License This dataset is licensed under the [CC BY-NC-SA 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/).

应用场景：