M3PDB
收藏魔搭社区2025-09-04 更新2025-05-17 收录
下载链接:
https://modelscope.cn/datasets/M3PDBdataset/M3PDB
下载链接
链接失效反馈官方服务:
资源简介:
# M<sup>3</sup>PDB</h3>
<!-- PROJECT SHIELDS -->
<!-- PROJECT LOGO -->
<br />
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/logo.png" alt="Logo" width="180" height="180">
</a>
<h3 align="center">M<sup>3</sup>PDB</h3>
<p align="center">
A Multi-Modal, Multi-Label, Multilingual Prompt Database
<br />
<a href="https://github.com/hizening/M3PDB"><strong>Explore the documentation of this project
»</strong></a>
<br />
<br />
<a href="https://jiangyu1205.github.io/subjective">View Demo (Demo and Subjective Test)</a>
·
<a href="https://github.com/shaojintian/Best_README_template/issues">Report Bug</a>
·
<a href="https://github.com/shaojintian/Best_README_template/issues">Make a Suggestion</a>
</p>
</p>
This README.md is intended for developers.
<div style="color: pink; font-weight: bold; font-size: 1.0em;">
1.Please note that M<sup>3</sup>PDB does not own the copyright to the audio files; the copyright remains with the original owners of the videos or audio. Users are permitted to use M<sup>3</sup>PDB dataset only for non-commercial purposes under the CC BY-NC-4.0 license.
2.Because the server’s upload channel is unreachable, larger files—including audio and image files—have been uploaded to <a href="https://www.modelscope.cn/datasets/M3PDBdataset/M3PDB">ModelScope</a>; you can visit that site to download them.
</div>
### What‘s new :fire:
- [2025.06] Update [code](https://github.com/hizening/M3PDB) , [demo](https://jiangyu1205.github.io/token2emo/) and [dataset](https://huggingface.co/datasets/M3PDB/M3PDB) for M<sup>3</sup>PDB.
### Table of Contents
- [Getting Started Guide](#getting-started-guide)
- [Development Configuration Requirements](#development-configuration-requirements)
- [Installation Steps](#installation-steps)
- [File Directory Description](#file-directory-description)
- [Dataset Construction](#dataset-construction)
- [Multimodal Data Preprocessing](#multimodal-data-preprocessing)
- [Annotation System](#annotation-system)
- [Unseen Language Annotation](#unseen-language-annotation)
- [Dataset Usage](#dataset-usage)
- [Multi-model Prompt Registration](#multi-model-prompt-registration)
- [Latency Aware Online Selection](#latency-aware-online-selection)
- [How to Contribute to the Open Source Project](#how-to-contribute-to-the-open-source-project)
- [Version Control](#version-control)
- [Contact](#contact)
- [License](#license)
- [Acknowledgements](#acknowledgements)
### Getting Started Guide
###### **Development Configuration Requirements**
Due to the significant differences in the configuration environments of the various models in this study, we chose to use separate environments for each model in practice. These models interact through API calls to achieve collaboration. The configuration method for each model's environment is documented separately in its respective folder.
###### **Installation Steps**
1. Get a free API Key at [https://chatgpt.com/](https://chatgpt.com/)
2. Clone the repo
```sh
git clone https://github.com/hizening/M3PDB.git
```
3. Different systems require different environments. Please refer to the `readme.md` of each subsystem for configuration.
### File Directory Description
```
filetree
├── /annotation_system/
│ ├── /Qwen2-Audio/
│ ├── /SenseVoice/
│ ├── /emotion2vec/
│ ├── /llmware/
│ ├── /readme.md/
├── /latency_aware_online_system/
│ ├── /latency_aware_online_selection.py/
│ ├── /readme.md/
├── /multi-model_prompt_registration/
│ ├── /facetts/
│ ├── /f2s.py/
│ ├── /s2s.py/
│ ├── /t2s.py/
│ ├── /readme.md/
├── /multimodal_data_preprocessing/
│ ├── /3D-Speaker/
│ ├── /speech/
│ ├── /video/
│ ├── /readme.md/
├── /unseen_language_annotation/
│ ├── /lang_prob_confirm/
│ ├── /selection/
│ ├── /readme.md/
```
### Dataset Construction
###### **Multimodal Data Preprocessing**
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/appendixA.2white.png" alt="Logo" style="width: auto; height: auto;">
</a>
1.Run the code below to achieve audio-video separation.
```sh
python multimodal_data_preprocessing/video/split_media.py
```
2.Run the code below to achieve speech format standardization.
```sh
python multimodal_data_preprocessing/speech/format_standardization.py
```
3.Run the code below to achieve video format standardization.
```sh
python multimodal_data_preprocessing/video/format_standardization.py
```
4.Run the code below to achieve speech enhancement.
```sh
python multimodal_data_preprocessing/speech/speech_enhancement.py
```
5.Run the code below to achieve video quality enhancement.
```sh
python multimodal_data_preprocessing/video/VideoSuperResolution/Train/eval.py
```
6.Run the code below to achieve multimodal speaker diarization and VAD.
```sh
cd multimodal_data_preprocessing/3D-Speaker/egs/3dspeaker/speaker-diarization/
bash run_audio.sh
bash run_video.sh
```
......
For more detailed information, please read the `/multimodal_data_preprocessing/readme.md`.
###### **Annotation System**
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/fig_RAGwhite.png" alt="Logo" style="height: 100px; height: auto;">
</a>
For more detailed information, please read the `/annotation_system/readme.md`.
###### **Unseen Language Annotation**
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/unseenlanguage.png" alt="Logo" style="height: 100px; height: auto;">
</a>
1.Run the code below to generate speech.
```sh
python unseen_language_annotation/lang_prob_confirm/tts/tts.py
```
2.Run the code below to evaluate the quality of the synthesized speech.
```sh
python dnsmos_local.py -t C:\temp\SampleClips -o sample.csv
```
......
For more detailed information, please read the `/unseen_language_annotation/readme.md`.
### Dataset Usage
###### **Multi-model Prompt Registration**
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/translate_prompt_selection—3white.png" alt="Logo" style="height: 600px; width: auto;">
</a>
</p>
1.Run the code below to match and register speech similar to the registered speech.
```sh
python /multi-model_prompt_registration/s2s.py
```
2.Run the code below to generate phase-based reference speech based on the registered face.
```sh
python /multi-model_prompt_registration/facetts/inference.py
```
3.Run the code below to match and register speech similar to the registered face.
```sh
python /multi-model_prompt_registration/f2s.py
```
4.Run the code below to match and register speech similar to the registered text.
```sh
python /multi-model_prompt_registration//t2s.py
```
......
For more detailed information, please read the `/multi-model_prompt_registration/readme.md`.
###### **Latency Aware Online Selection**
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/appendixGwhite.png" alt="Logo" style="height: 100px; height: auto;">
</a>
1.Run the code below to dynamically find the most suitable speech.
```sh
python /latency_aware_online_selection/latency_aware_online_selection.py
```
......
For more detailed information, please read the `/latency_aware_online_selection/readme.md`.
### How to Contribute to the Open Source Project
Contributions make the open-source community an excellent place for learning, inspiration, and creation. Any contribution you make is **greatly appreciated**.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
### Version Control
This project uses Git for version control. You can check the current available version in the repository.
### Contact
If you have any comment or question about M<sup>3</sup>PDB, please contact us by
- email: zhuboyu@mail.nwpu.edu.cn
### License
M<sup>3</sup>PDB is released under the [MIT](https://github.com/hizening/M3PDB/blob/main/LICENSE.txt).
### Acknowledgements
M<sup>3</sup>PDB contains third-party components and code modified from some open-source repos, including: <br>
1. datasets
[Emilia Dataset](https://github.com/open-mmlab/Amphion/tree/main/preprocessors/Emilia), [voxceleb](https://huggingface.co/datasets/ProgramComputer/voxceleb), [voxpopuli](https://huggingface.co/datasets/facebook/voxpopuli)
2. code
[3D-Speaker](https://github.com/modelscope/3D-Speaker), [Side-Profile-Detection](https://github.com/nawafalageel/Side-Profile-Detection), [SenseVoice](https://github.com/FunAudioLLM/SenseVoice), [emotion2vec](https://github.com/ddlBoJack/emotion2vec), [seamless_communication](https://github.com/facebookresearch/seamless_communication), [CosyVoice](https://github.com/FunAudioLLM/CosyVoice), [whisper](https://github.com/openai/whisper), [Imaginary Voice](https://github.com/naver-ai/facetts), [whisper](https://github.com/openai/whisper), [gpt-4o](https://openai.com/zh-Hans-CN/index/gpt-4o-system-card/), [deepface](https://github.com/serengil/deepface), [OSUM](https://github.com/ASLP-lab/OSUM), [XTTS-v2](https://huggingface.co/coqui/XTTS-v2)
<!-- ## Citations
If you find this repository useful, please consider giving a star :star: and citation :t-rex::
```BibTeX
@article{chen20243d,
title={3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization},
author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},
booktitle={ICASSP},
year={2025}
}
``` -->
<!-- links -->
[your-project-path]:hizening/M3PDB
[contributors-shield]: https://img.shields.io/github/contributors/shaojintian/Best_README_template.svg?style=flat-square
[contributors-url]: https://github.com/hizening/M3PDB/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/shaojintian/Best_README_template.svg?style=flat-square
[forks-url]: https://github.com/hizening/M3PDB/network/members
[stars-shield]: https://img.shields.io/github/stars/shaojintian/Best_README_template.svg?style=flat-square
[stars-url]: https://github.com/hizening/M3PDB/stargazers
[issues-shield]: https://img.shields.io/github/issues/shaojintian/Best_README_template.svg?style=flat-square
[issues-url]: https://img.shields.io/github/issues/hizening/M3PDB.svg
[license-shield]: https://img.shields.io/github/license/shaojintian/Best_README_template.svg?style=flat-square
[license-url]: https://github.com/hizening/M3PDB/blob/master/LICENSE.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/shaojintian
# M³PDB
<!-- 项目徽章 -->
<!-- 项目Logo -->
<br />
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/logo.png" alt="Logo" width="180" height="180">
</a>
<h3 align="center">M³PDB</h3>
<p align="center">
多模态、多标签、多语言提示词数据库
<br />
<a href="https://github.com/hizening/M3PDB"><strong>探索本项目文档 →</strong></a>
<br />
<br />
<a href="https://jiangyu1205.github.io/subjective">查看演示(演示与主观测试)</a>
·
<a href="https://github.com/shaojintian/Best_README_template/issues">提交缺陷报告</a>
·
<a href="https://github.com/shaojintian/Best_README_template/issues">提出改进建议</a>
</p>
</p>
本README.md面向开发者编写。
<div style="color: pink; font-weight: bold; font-size: 1.0em;">
1. 请注意,M³PDB不拥有音频文件的版权,版权仍归原视频或音频所有者所有。用户仅可在CC BY-NC-4.0许可协议下,将M³PDB数据集用于非商业用途。
2. 由于服务器上传通道无法访问,包括音频与图像文件在内的大文件已上传至<a href="https://www.modelscope.cn/datasets/M3PDBdataset/M3PDB">ModelScope</a>,您可访问该站点进行下载。
</div>
### 更新动态 :fire:
- [2025.06] 更新了M³PDB的<a href="https://github.com/hizening/M3PDB">代码</a>、<a href="https://jiangyu1205.github.io/token2emo/">演示页面</a>与<a href="https://huggingface.co/datasets/M3PDB/M3PDB">数据集</a>。
### 目录
- [快速入门指南](#快速入门指南)
- [开发配置要求](#开发配置要求)
- [安装步骤](#安装步骤)
- [文件目录说明](#文件目录说明)
- [数据集构建](#数据集构建)
- [多模态数据预处理](#多模态数据预处理)
- [标注系统](#标注系统)
- [未知语言标注](#未知语言标注)
- [数据集使用](#数据集使用)
- [多模态提示词注册](#多模态提示词注册)
- [时延感知在线选择](#时延感知在线选择)
- [开源项目贡献指南](#开源项目贡献指南)
- [版本控制](#版本控制)
- [联系方式](#联系方式)
- [许可证](#许可证)
- [致谢](#致谢)
### 快速入门指南
#### 开发配置要求
由于本研究涉及的各类模型配置环境差异显著,我们在实际开发中为每个模型配置了独立的运行环境,各模型通过API调用实现协同。每个模型的环境配置方法已在其对应文件夹中单独文档化。
#### 安装步骤
1. 在 <a href="https://chatgpt.com/">https://chatgpt.com/</a> 获取免费API密钥
2. 克隆仓库
sh
git clone https://github.com/hizening/M3PDB.git
3. 不同系统需要不同的环境配置,请参阅各子系统的`readme.md`文件进行配置。
### 文件目录说明
filetree
├── /annotation_system/
│ ├── /Qwen2-Audio/
│ ├── /SenseVoice/
│ ├── /emotion2vec/
│ ├── /llmware/
│ ├── /readme.md/
├── /latency_aware_online_system/
│ ├── /latency_aware_online_selection.py/
│ ├── /readme.md/
├── /multi-model_prompt_registration/
│ ├── /facetts/
│ ├── /f2s.py/
│ ├── /s2s.py/
│ ├── /t2s.py/
│ ├── /readme.md/
├── /multimodal_data_preprocessing/
│ ├── /3D-Speaker/
│ ├── /speech/
│ ├── /video/
│ ├── /readme.md/
├── /unseen_language_annotation/
│ ├── /lang_prob_confirm/
│ ├── /selection/
│ ├── /readme.md/
### 数据集构建
#### 多模态数据预处理
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/appendixA.2white.png" alt="Logo" style="width: auto; height: auto;">
</a>
1. 运行以下代码实现音视频分离:
sh
python multimodal_data_preprocessing/video/split_media.py
2. 运行以下代码实现语音格式标准化:
sh
python multimodal_data_preprocessing/speech/format_standardization.py
3. 运行以下代码实现视频格式标准化:
sh
python multimodal_data_preprocessing/video/format_standardization.py
4. 运行以下代码实现语音增强:
sh
python multimodal_data_preprocessing/speech/speech_enhancement.py
5. 运行以下代码实现视频画质增强:
sh
python multimodal_data_preprocessing/video/VideoSuperResolution/Train/eval.py
6. 运行以下代码实现多模态说话人分段聚类与语音活动检测(Voice Activity Detection, VAD):
sh
cd multimodal_data_preprocessing/3D-Speaker/egs/3dspeaker/speaker-diarization/
bash run_audio.sh
bash run_video.sh
更多详细信息请参阅`/multimodal_data_preprocessing/readme.md`。
#### 标注系统
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/fig_RAGwhite.png" alt="Logo" style="height: 100px; height: auto;">
</a>
更多详细信息请参阅`/annotation_system/readme.md`。
#### 未知语言标注
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/unseenlanguage.png" alt="Logo" style="height: 100px; height: auto;">
</a>
1. 运行以下代码生成语音:
sh
python unseen_language_annotation/lang_prob_confirm/tts/tts.py
2. 运行以下代码评估合成语音的质量:
sh
python dnsmos_local.py -t C: empSampleClips -o sample.csv
更多详细信息请参阅`/unseen_language_annotation/readme.md`。
### 数据集使用
#### 多模态提示词注册
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/translate_prompt_selection—3white.png" alt="Logo" style="height: 600px; width: auto;">
</a>
</p>
1. 运行以下代码匹配并注册与已注册语音相似的语音:
sh
python /multi-model_prompt_registration/s2s.py
2. 运行以下代码基于已注册的人脸生成基于相位的参考语音:
sh
python /multi-model_prompt_registration/facetts/inference.py
3. 运行以下代码匹配并注册与已注册人脸相似的语音:
sh
python /multi-model_prompt_registration/f2s.py
4. 运行以下代码匹配并注册与已注册文本相似的语音:
sh
python /multi-model_prompt_registration//t2s.py
更多详细信息请参阅`/multi-model_prompt_registration/readme.md`。
#### 时延感知在线选择
<p align="center">
<a href="https://github.com/shaojintian/Best_README_template/">
<img src="images/appendixGwhite.png" alt="Logo" style="height: 100px; height: auto;">
</a>
1. 运行以下代码动态筛选最合适的语音:
sh
python /latency_aware_online_selection/latency_aware_online_selection.py
更多详细信息请参阅`/latency_aware_online_selection/readme.md`。
### 开源项目贡献指南
贡献使开源社区成为学习、启发与创造的绝佳场所。您的每一份贡献都将**不胜感激**。
1. Fork 本项目
2. 创建您的特性分支(`git checkout -b feature/AmazingFeature`)
3. 提交您的修改(`git commit -m 'Add some AmazingFeature'`)
4. 推送至分支(`git push origin feature/AmazingFeature`)
5. 发起拉取请求
### 版本控制
本项目使用Git进行版本控制,您可在仓库中查看当前可用的版本。
### 联系方式
如果您对M³PDB有任何意见或疑问,请通过以下方式联系我们:
- 邮箱:zhuboyu@mail.nwpu.edu.cn
### 许可证
M³PDB采用<a href="https://github.com/hizening/M3PDB/blob/main/LICENSE.txt">MIT</a>许可证发布。
### 致谢
M³PDB包含第三方组件与从部分开源仓库修改而来的代码,包括:<br>
1. 数据集
<a href="https://github.com/open-mmlab/Amphion/tree/main/preprocessors/Emilia">Emilia Dataset</a>, <a href="https://huggingface.co/datasets/ProgramComputer/voxceleb">voxceleb</a>, <a href="https://huggingface.co/datasets/facebook/voxpopuli">voxpopuli</a>
2. 代码
<a href="https://github.com/modelscope/3D-Speaker">3D-Speaker</a>, <a href="https://github.com/nawafalageel/Side-Profile-Detection">Side-Profile-Detection</a>, <a href="https://github.com/FunAudioLLM/SenseVoice">SenseVoice</a>, <a href="https://github.com/ddlBoJack/emotion2vec">emotion2vec</a>, <a href="https://github.com/facebookresearch/seamless_communication">seamless_communication</a>, <a href="https://github.com/FunAudioLLM/CosyVoice">CosyVoice</a>, <a href="https://github.com/openai/whisper">whisper</a>, <a href="https://github.com/naver-ai/facetts">Imaginary Voice</a>, <a href="https://github.com/openai/whisper">whisper</a>, <a href="https://openai.com/zh-Hans-CN/index/gpt-4o-system-card/">gpt-4o</a>, <a href="https://github.com/serengil/deepface">deepface</a>, <a href="https://github.com/ASLP-lab/OSUM">OSUM</a>, <a href="https://huggingface.co/coqui/XTTS-v2">XTTS-v2</a>
<!-- 引用部分
如果您认为本仓库对您有帮助,请考虑给我们点亮Star :star: 并引用 :t-rex::
BibTeX
@article{chen20243d,
title={3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization},
author={Chen, Yafeng and Zheng, Siqi and Wang, Hui and Cheng, Luyao and others},
booktitle={ICASSP},
year={2025}
}
-->
<!-- 链接 -->
[your-project-path]:hizening/M3PDB
[contributors-shield]: https://img.shields.io/github/contributors/shaojintian/Best_README_template.svg?style=flat-square
[contributors-url]: https://github.com/hizening/M3PDB/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/shaojintian/Best_README_template.svg?style=flat-square
[forks-url]: https://github.com/hizening/M3PDB/network/members
[stars-shield]: https://img.shields.io/github/stars/shaojintian/Best_README_template.svg?style=flat-square
[stars-url]: https://github.com/hizening/M3PDB/stargazers
[issues-shield]: https://img.shields.io/github/issues/shaojintian/Best_README_template.svg?style=flat-square
[issues-url]: https://img.shields.io/github/issues/hizening/M3PDB.svg
[license-shield]: https://img.shields.io/github/license/shaojintian/Best_README_template.svg?style=flat-square
[license-url]: https://github.com/hizening/M3PDB/blob/master/LICENSE.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=flat-square&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/shaojintian
提供机构:
maas
创建时间:
2025-05-16



