CPP
收藏魔搭社区2025-07-16 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/CPP
下载链接
链接失效反馈官方服务:
资源简介:
displayName: CPP (Chinese Polyphones with Pinyin)
labelTypes:
- Chinese Corpus
license:
- Apache 2.0
mediaTypes:
- Text
paperUrl: https://arxiv.org/pdf/2004.03136v5.pdf
publishDate: "2020"
publishUrl: https://github.com/kakaobrain/g2pM
publisher:
- Korea Advanced Institute of Science and Technology
- Kakao Brain
tags:
- Polyphonic
taskTypes: []
---
# 数据集介绍
## 简介
一个基准数据集,包含 99,000 多个用于中文多音字消歧的句子。
## 类定义
null
## 引文
```
@article{park2020g2pm,
title={g2pm: A neural grapheme-to-phoneme conversion package for mandarin chinese based on a new open benchmark dataset},
author={Park, Kyubyong and Lee, Seanie},
journal={arXiv preprint arXiv:2004.03136},
year={2020}
}
```
## Download dataset
:modelscope-code[]{type="git"}
displayName: CPP(带拼音的中文多音字数据集,Chinese Polyphones with Pinyin)
labelTypes:
- 中文语料库
license:
- Apache 2.0
mediaTypes:
- 文本
paperUrl: https://arxiv.org/pdf/2004.03136v5.pdf
publishDate: "2020"
publishUrl: https://github.com/kakaobrain/g2pM
publisher:
- 韩国科学技术院(Korea Advanced Institute of Science and Technology)
- Kakao Brain
tags:
- 多音字
taskTypes: []
---
# 数据集介绍
## 简介
本数据集为面向中文多音字消歧任务的基准数据集,包含逾99000条语句。
## 类定义
无
## 引文
@article{park2020g2pm,
title={g2pm:基于全新开放基准数据集的普通话神经字素到音素(grapheme-to-phoneme)转换工具包},
author={Park, Kyubyong and Lee, Seanie},
journal={arXiv预印本 arXiv:2004.03136},
year={2020}
}
## 下载数据集
:modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-06-30



