five

Human review for post-training improvement of low-resource language performance in large language models|语言模型数据集|低资源语言数据集

收藏
DataONE2024-04-25 更新2024-06-08 收录
语言模型
低资源语言
下载链接:
https://search.dataone.org/view/sha256:af00dba2f1850b7c12e25aa07e5ee0579a48f4c854b8a97229cfdad82dabb0c3
下载链接
链接失效反馈
资源简介:
Large language models (LLMs) have significantly improved natural language processing, holding the potential to support health workers and their clients directly. Unfortunately, there is a substantial and variable drop in performance for low-resource languages. Here we present results from an exploratory case study in Malawi, aiming to enhance the performance of LLMs in Chichewa through innovative prompt engineering techniques. By focusing on practical evaluations over traditional metrics, we assess the subjective utility of LLM outputs, prioritizing end-user satisfaction. Our findings suggest that tailored prompt engineering may improve LLM utility in underserved linguistic contexts, offering a promising avenue to bridge the language inclusivity gap in digital health interventions., We compared the reported performance of five variations of an LLM-based chatbot prototype through a two-step process. First, a cohort of 24 target end users, community health volunteers (CHVs) in Malawi, was recruited to generate transcripts by interacting with the prototypes. Second, an additional cohort of 22 CHVs was recruited to evaluate the transcripts and provide subjective feedback on the quality and utility of the language in the model’s responses.  CHVs in the first cohort were preassigned to one of the five chatbot variations and were instructed to generate a single transcript by interacting with their assined chatbot variation. A second set of CHVs was then recruited to conduct the transcript review. This second group of CHVs was also recruited through convenience sampling to form an intended cohort of 25 participants; however, three were unable to attend. A total of 22 CHVs participated in the transcript review, where each CHV was instructed to review and rate four of the tr..., , # Human review for post-training improvement of low-resource language performance in large language models [https://doi.org/10.5061/dryad.4xgxd25jb](https://doi.org/10.5061/dryad.4xgxd25jb) This dataset comprises a single Excel file with transcript review survey results as reported by a cohort of community health volunteers (CHVs) in Malawi. CHVs were each asked to review and rate four pre-assigned transcripts, each generated from 1 of 5 Chichewa-speaking chatbot variations differing in temperature model parameter and/or changes to the system prompt. This survey was designed to collect CHV feedback on the quality of Chichewa spoken by the chatbot, given the performance gap between higher- and lower-resource languages. Results suggest that the use of specific prompt engineering techniques may improve foundational model utility when conversing using low-resource languages.Â
创建时间:
2025-07-30
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
5,000+
优质数据集
54 个
任务类型
进入经典数据集