m7142yosuke/english2kana-v1
收藏Hugging Face2024-12-15 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/m7142yosuke/english2kana-v1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集来源于日本法人番号公表站点的公司名称及其对应的假名。处理步骤包括移除公司标识、提取全英文字母的公司名称、移除公司类型标识符、将全角字符转换为半角、排除驼峰式复合词、将所有文本转换为小写,并确保训练集和测试集中不出现相同名称的公司,以评估泛化性能。
This dataset is derived from the Corporate Number Publication Site in Japan, containing corporate names composed entirely of English letters and their corresponding kana. The data processing steps include removing corporate designations, converting full-width to half-width, excluding compound words, converting to lowercase, etc. Additionally, it ensures that no corporation with the same name appears in both the training and test sets to evaluate generalization performance.
提供机构:
m7142yosuke



