raptorkwok/cantonese-chinese-dataset-gen2
收藏Hugging Face2024-08-09 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/raptorkwok/cantonese-chinese-dataset-gen2
下载链接
链接失效反馈官方服务:
资源简介:
Cantonese-Written Chinese Parallel Dataset包含大约100万对粤语和书面汉语句子,采用JSONL格式。句子主题涵盖新闻、小说和日常对话。翻译由Microsoft Azure Translate API完成,并由我手动校正问题。每个句子也由我逐一手动验证。如果有更多的人力和财力资源,该数据集可以进一步扩展。
Cantonese-Written Chinese Parallel Dataset with roughly 1 million pairs of Cantonese and Written Chinese sentences, in JSONL format. The sentence topics include news, novels and daily conversations. Translations are done by Microsoft Azure Translate API, and manually corrected problems by me. Sentences are also manually verified by me one by one. The dataset could be extended if there are more resources, in terms of manpower and money.
提供机构:
raptorkwok



