five

Composition of the HT corpus.

收藏
Figshare2025-06-02 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Composition_of_the_HT_corpus_/29214347
下载链接
链接失效反馈
官方服务:
资源简介:
This study examines the presence of simplification, a translation universal (TU), in English-to-Chinese translation by comparing the Mean Dependency Distance (MDD) and Mean Hierarchical Distance (MHD) of Crowdsourcing human translations, Large Language Model (LLM) translations, and original Chinese texts across fifteen genres. Through analysis of three balanced comparable corpora, the research found that: (i) Compared to original Chinese texts, both human-translated and LLM-translated Chinese texts demonstrated significant syntactic simplification across all genres. (ii) Human translations exhibited a more pronounced tendency toward syntactic simplification than LLM translations across all genres. These findings not only validate the simplification hypothesis at the syntactic level but also highlight the different cognitive and processing mechanisms underlying human and LLM translation processes. The research indicates that human translators possess an active ability to optimize complex syntax that current LLMs lack, providing valuable reference for future development of LLMs and methods for LLM-assisted translation. Additionally, by adopting MDD and MHD as holistic measures of syntactic complexity, this study offers new perspectives for TU research and provides empirical insights into the linguistic nature of crowdsourcing translations from an English-to-Chinese perspective.
创建时间:
2025-06-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作