Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon
收藏DataCite Commons2022-10-13 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2022L01
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon was developed by the Cantonese Computational Linguistics Infrastructure Working Group. It contains approximately 130,000 Cantonese character, word, and phrase entries paired with their corresponding romanized pronunciations in <a href="https://jyutping.org/en/">Jyutping</a>, a scheme created by The Linguistic Society of Hong Kong.</p><br>
<h3>Data</h3><br>
<p>Data was collected from a variety of physical and online sources. The character collection was subjected to a normalization process for differences between traditional and simplified Chinese, regional differences and other variants in Chinese characters, and differences in orthography. Additional information about this process and the lexicon in general is available in the documentation included with this release.</p><br>
<p>The corpus data is presented in a collection of UTF-8 encoded csv files.</p><br>
<h3>Samples</h3><br>
<p>Please view this <a href="desc/addenda/LDC2022L01.txt">word sample</a>.</p><br>
<h3>Updates</h3><br>
<p>None at this time.</p></br>
Portions © 2023 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2022-10-10



