five

Echoes of Vagueness: A Corpus-Based Study of Semantic Ambiguity in Hakka AI Translation

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/EBZYFC
下载链接
链接失效反馈
官方服务:
资源简介:
This study explores how large language models (LLMs), specifically GPT-4o, handle semantic ambiguity in low-resource languages, focusing on Hakka (Sfi-Hsien dialect). Unlike previous studies on Taiwanese which emphasize semantic leakage, this paper investigates how LLMs interpret and resolve lexical polysemy, context-dependent meanings, and pragmatically underspecified expressions during Hakka-to-Mandarin AI translation. We introduce the notion of Ambiguity Resolution Trajectories (ART) to trace whether ambiguity is preserved, disambiguated, distorted, or newly generated through back-translation. Our corpus, drawn from the Hakka Language Certification Vocabulary Database, was translated and back-translated using GPT-4o. Through a combined framework of entropy-based stylometrics, embedding divergence, and qualitative content analysis, we categorize ambiguity phenomena and assess AI's pragmatic decision-making. Findings reveal systematic biases in how GPT-4o resolves or simplifies ambiguity, with implications for translation studies, computational pragmatics, and low-resource language equity.
创建时间:
2025-05-30
二维码
社区交流群
二维码
科研交流群
商业服务