Audio-Turing-Test-Corpus
收藏魔搭社区2025-12-04 更新2025-07-19 收录
下载链接:
https://modelscope.cn/datasets/meituan/Audio-Turing-Test-Corpus
下载链接
链接失效反馈官方服务:
资源简介:
# 📚 Audio Turing Test Corpus
> A high‑quality, multidimensional Chinese transcript corpus designed to evaluate whether a machine‑generated speech sample can fool human listeners—the “Audio Turing Test.”
## About Audio Turing Test (ATT)
ATT is an evaluation framework with a standardized human evaluation protocol and an accompanying dataset, aiming to resolve the lack of unified protocols in TTS evaluation and the difficulty in comparing multiple TTS systems. To further support the training and iteration of TTS systems, we utilized additional private evaluation data to train Auto-ATT model based on Qwen2-Audio-7B, enabling a model-as-a-judge approach for rapid evaluation of TTS systems on the ATT dataset. The datasets and Auto-ATT model can be cound in [ATT Collection](https://huggingface.co/collections/meituan/audio-turing-test-682446320368164faeaf38a4).
## Dataset Description
This dataset provides 500 textual transcripts from the Audio Turing Test (ATT) corpus, corresponding to the "transcripts known" (white-box) setting. These samples are part of the full 1,000-sample benchmark described in our paper, with the remaining 500 black-box entries hosted privately.
To prevent data contamination, we only release the white-box subset publicly. The black-box subset, while evaluated under identical protocols, is hosted privately on [AGI-Eval](https://agi-eval.cn/evaluation/home) to safeguard the integrity and future utility of the benchmark.
This separation between public and private subsets is a core part of the ATT design, ensuring the benchmark remains a trustworthy and unbiased tool for evaluating TTS systems.
The corpus spans five key linguistic and stylistic dimensions relevant to Chinese TTS evaluation:
* **Chinese-English Code-switching**
* **Paralinguistic Features and Emotions**
* **Special Characters and Numerals**
* **Polyphonic Characters**
* **Classical Chinese Poetry/Prose**
For each dimension, this open-source subset includes 100 manually reviewed transcripts.
Additionally, the dataset includes 104 "trap" transcripts for attentiveness checks during human evaluation:
* **35 flawed synthetic transcripts:** intentionally flawed scripts designed to produce clearly synthetic and unnatural speech.
* **69 authentic human transcripts:** scripts corresponding to genuine human recordings, ensuring evaluators can reliably distinguish between human and synthetic speech.
## How to Use This Dataset
1. **Generate Speech:** Use these transcripts to generate audio with your TTS model. Pay attention that here are some phone numbers, email addresses, and websites in the corpus. Due to potential sensitivity risks, we have masked these texts as placeholders: [PHONE_MASK], [EMAIL_MASK], and [WEB_MASK]. However, to properly test the TTS system’s capabilities in this regard, please replace these placeholders with actual content before use.
2. **Evaluate:** Use our [Auto-ATT evaluation model](https://huggingface.co/Meituan/Auto-ATT) to score your generated audio.
3. **Benchmark:** Compare your model’s scores against scores from other evaluated TTS models listed in our research paper and the "trap" audio clips in [Audio Turing Test Audio](https://huggingface.co/collections/Meituan/audio-turing-test-6826e24d2197bf91fae6d7f5).
## Data Format
### Normal Transcripts
```json
{
"ID": "poem-100",
"Text": "姚鼐在《登泰山记》中详细记录了登山的经过:“余始循以入,道少半,越中岭,复循西谷,遂至其巅。”这番描述让我仿佛身临其境地感受到了登山的艰辛与乐趣。当我亲自攀登泰山时,也经历了类似的艰辛与挑战。虽然路途遥远且充满艰辛,但当我站在山顶俯瞰群山时,那份成就感与自豪感让我倍感满足与幸福。",
"Dimension": "poem",
"Split": "white Box"
}
```
### Trap Transcripts
```json
{
"ID": "human_00001",
"Text": "然后当是去年,也是有一个契机,我就,呃,报了一个就是小凯书法家的这位老师的班。",
"Ground Truth": 1
}
```
* **ID**: Unique identifier for the transcript.
* **Text**: The text intended for speech synthesis.
* **Dimension**: Linguistic/stylistic category (only for normal transcripts).
* **Split**: Indicates the "white Box" scenario (only for normal transcripts).
* **Ground Truth**: Indicates if the transcript corresponds to human speech (1) or flawed synthetic speech (0) (only for trap transcripts).
## Citation
This dataset is openly accessible for research purposes. If you use this dataset in your research, please cite our paper:
```
@software{Audio-Turing-Test-Transcripts,
author = {Wang, Xihuai and Zhao, Ziyi and Ren, Siyu and Zhang, Shao and Li, Song and Li, Xiaoyu and Wang, Ziwen and Qiu, Lin and Wan, Guanglu and Cao, Xuezhi and Cai, Xunliang and Zhang, Weinan},
title = {Audio Turing Test: Benchmarking the Human-likeness and Naturalness of Large Language Model-based Text-to-Speech Systems in Chinese},
year = {2025},
url = {https://huggingface.co/Meituan/Audio-Turing-Test-Corpus},
publisher = {huggingface},
}
```
📚 音频图灵测试语料库(Audio Turing Test Corpus)
> 本语料库为高质量多维度中文转写语料,旨在评估机器生成语音样本能否骗过人类听众,即「音频图灵测试(Audio Turing Test, ATT)」。
## 关于音频图灵测试(ATT)
音频图灵测试(Audio Turing Test, ATT)是一套具备标准化人工评估协议与配套数据集的评估框架,旨在解决文本转语音(Text-to-Speech, TTS)评估缺乏统一协议、难以跨系统对比的痛点。为进一步支撑TTS系统的训练与迭代,我们借助额外的私有评估数据,基于Qwen2-Audio-7B训练了Auto-ATT模型,可实现「模型即裁判」的快速评估模式,用于在ATT语料库上快速评测TTS系统。本数据集与Auto-ATT模型可访问 [ATT 合集](https://huggingface.co/collections/meituan/audio-turing-test-682446320368164faeaf38a4)。
## 数据集详情
本数据集包含音频图灵测试语料库中的500条文本转写结果,对应「转写公开(白盒,white-box)」测试场景。这些样本为论文中所述完整1000条基准数据集的一部分,剩余500条黑盒(black-box)样本以私有形式托管。
为避免数据污染,我们仅公开白盒子集。黑盒子集虽采用完全一致的评估协议,但托管于 [AGI-Eval](https://agi-eval.cn/evaluation/home) 私有平台,以保障该基准数据集的完整性与后续可用性。
公、私子集的分离设计是ATT框架的核心一环,确保该基准始终为评测TTS系统提供可信且无偏的工具。
该语料库覆盖中文TTS评估所需的五大关键语言与风格维度:
* **中英代码切换**
* **副语言特征与情感表达**
* **特殊字符与数字**
* **多音字**
* **古典诗词/散文**
每个维度的开源子集中均包含100条经人工审核的转写结果。
此外,本数据集还包含104条「陷阱转写」,用于人工评估过程中的注意力校验:
* **35条瑕疵合成转写**:故意设计的瑕疵脚本,用于生成明显不自然的合成语音。
* **69条真实人类转写**:对应真实人类录音的脚本,用于确保评估者可可靠区分人类语音与合成语音。
## 使用指南
1. **生成语音**:使用本数据集的转写结果,通过你的TTS模型生成语音。需注意,语料中包含部分电话号码、邮箱地址与网址。出于潜在敏感风险考量,我们已将这些内容替换为占位符:`[PHONE_MASK]`、`[EMAIL_MASK]`与`[WEB_MASK]`。但为了完整测试TTS系统在该场景下的性能,请在使用前将占位符替换为真实内容。
2. **评估模型**:使用我们的 [Auto-ATT 评估模型](https://huggingface.co/Meituan/Auto-ATT) 为你生成的语音打分。
3. **基准对比**:将你的模型得分与论文中列出的其他已评测TTS模型的得分,以及 [音频图灵测试音频合集](https://huggingface.co/collections/Meituan/audio-turing-test-6826e24d2197bf91fae6d7f5) 中的「陷阱」音频片段进行对比。
## 数据格式
### 标准转写样本
json
{
"ID": "poem-100",
"Text": "姚鼐在《登泰山记》中详细记录了登山的经过:“余始循以入,道少半,越中岭,复循西谷,遂至其巅。”这番描述让我仿佛身临其境地感受到了登山的艰辛与乐趣。当我亲自攀登泰山时,也经历了类似的艰辛与挑战。虽然路途遥远且充满艰辛,但当我站在山顶俯瞰群山时,那份成就感与自豪感让我倍感满足与幸福。",
"Dimension": "poem",
"Split": "white Box"
}
### 陷阱转写样本
json
{
"ID": "human_00001",
"Text": "然后当是去年,也是有一个契机,我就,呃,报了一个就是小凯书法家的这位老师的班。",
"Ground Truth": 1
}
* **ID**:转写样本的唯一标识符。
* **Text**:用于语音合成的文本内容。
* **Dimension**:语言/风格分类(仅标准转写样本包含此字段)。
* **Split**:标识测试场景为「白盒」(仅标准转写样本包含此字段)。
* **Ground Truth**:标识该转写对应人类语音(1)还是瑕疵合成语音(0)(仅陷阱转写样本包含此字段)。
## 引用声明
本数据集仅开放用于学术研究。若您在研究中使用本数据集,请引用如下论文:
@software{Audio-Turing-Test-Transcripts,
author = {Wang, Xihuai and Zhao, Ziyi and Ren, Siyu and Zhang, Shao and Li, Song and Li, Xiaoyu and Wang, Ziwen and Qiu, Lin and Wan, Guanglu and Cao, Xuezhi and Cai, Xunliang and Zhang, Weinan},
title = {Audio Turing Test: Benchmarking the Human-likeness and Naturalness of Large Language Model-based Text-to-Speech Systems in Chinese},
year = {2025},
url = {https://huggingface.co/Meituan/Audio-Turing-Test-Corpus},
publisher = {huggingface},
}
提供机构:
maas
创建时间:
2025-07-15



