talkpl-ai/talkplay-db-v1
收藏Hugging Face2025-02-23 更新2025-04-19 收录
下载链接:
https://hf-mirror.com/datasets/talkpl-ai/talkplay-db-v1
下载链接
链接失效反馈官方服务:
资源简介:
TalkPlay数据集是一个大规模的音乐对话数据集,基于百万播放列表数据集(Million Playlist Dataset,MPD)构建而成。MPD是2018年推出的包含一百万个Spotify播放列表的数据集,是公开可用的最大播放列表数据集之一。它提供了全面的轨道元数据和播放列表共现信息。本数据集利用预训练的标题生成、转录和音乐信息检索模型扩展模态,并通过大型语言模型(LLMs)将播放列表数据转换为对话数据。数据集使用GEMINI-1.5-FLASH-002生成,创建了用户与AI助手之间的自然语言对话。数据集遵循对话和音乐推荐的一致性、覆盖多种模态(音频、歌词、元数据、语义注释)、模拟实际用户行为的跳过/拒绝功能,以及适用于训练的JSON格式结构。
The TalkPlay dataset is a large-scale music conversation dataset created using the Million Playlist Dataset (MPD) as its foundation. The MPD, introduced in 2018, contains one million Spotify playlists and remains one of the largest publicly available playlist datasets. It provides comprehensive track metadata and playlist co-occurrence information. We leverage pretrained captioning, transcription, and MIR models to expand modalities, and transform playlist data into conversational data through LLMs. The dataset was generated using GEMINI-1.5-FLASH-002, creating natural language conversations between users and an AI assistant, with a focus on coherence in dialogue and music recommendations, coverage of multiple modalities, realistic user simulation, and structured JSON format for training compatibility.
提供机构:
talkpl-ai



