five

CedPane - 音译名、人名、地名补充词典(公共领域)

收藏
魔搭社区2025-11-20 更新2025-09-20 收录
下载链接:
https://modelscope.cn/datasets/SSBrown/CedPane
下载链接
链接失效反馈
官方服务:
资源简介:
汉语学习者常借助软件辅助阅读,但这类软件的词典往往缺少对专有名词(如人名、地名)的收录。例如,“沃兹沃思” 是英国诗人Wordsworth的中文译名,但一些软件会错误地将其拆解为“沃(灌溉)兹(这个)沃(灌溉)思(思想)”,而无法识别这是一个整体音译的姓名。 为此,我整理了这份《专名等副刊》(CedPane),收录了大量常见于中文文本但未被通用词典充分覆盖的音译名和特殊表达,旨在帮助软件更准确地识别这些词汇,避免误切分。标签: nlp, translation, lexicon, named-entity-recognition, machine-translation, public-domain

Chinese language learners often rely on software to assist their reading, but the dictionaries built into such software usually lack coverage of proper nouns such as personal names and place names. For example, "Wòzīwòsī" is the Chinese transliteration of Wordsworth, the British poet, yet some software will incorrectly segment it into "Wò (irrigation), zī (this), wò (irrigation), sī (thought)" and fail to recognize it as an integrated transliterated personal name. To address this problem, I have compiled this *Special Names Supplement (CedPane)*, which contains a large number of transliterated names and special expressions that are commonly found in Chinese texts but not sufficiently covered by general-purpose dictionaries. This resource aims to help software accurately identify these terms and avoid erroneous word segmentation. Tags: nlp, translation, lexicon, named-entity-recognition, machine-translation, public-domain
提供机构:
maas
创建时间:
2025-08-25
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
CedPane是一个公共领域的中英文音译名、人名和地名补充词典,旨在帮助中文学习软件准确识别专有名词,避免误切分。该数据集收录了经过验证的公共领域词条,包含英文原文、简繁体中文、拼音及发音信息,适用于词典软件和NLP工具的集成。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务