Prompt Datasets to Evaluate LLM Safety
收藏DataCite Commons2024-05-19 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/prompt-datasets-evaluate-llm-safety
下载链接
链接失效反馈官方服务:
资源简介:
The rise in Generative Artificial Intelligence technology through applications like ChatGPT has increased awareness about the presence of biases within machine learning models themselves. The data that Large Language Models (LLMs) are trained upon contain inherent biases as they reflect societal biases and stereotypes. This can lead to the further propagation of biases. In this paper, I establish a baseline measurement of the gender and racial bias within the domains of crime and employment across major LLMs using “ground truth” data published by the U.S. Bureau of Labor and the FBI’s UCR program. I then propose the novel approach of fact-based prompting (called “fact-shot prompting”) in mitigating bias in LLM-generated responses. Fact-shot prompting supplements factual information to a prompt that is given to an LLM. This approach also allows the observation of the effectiveness of current LLM training. I observe patterns in predictions involving race and gender when applied to various occupations and crimes. This approach to bias mitigation within LLMs themselves may provide key insights on specific areas of training failure and a plan of correction.
随着ChatGPT等应用推动生成式人工智能(Generative AI)技术的发展,人们愈发意识到机器学习模型自身存在偏见问题。大语言模型(Large Language Models,LLMs)训练所用的数据蕴含固有偏见,因这些数据反映了社会层面的偏见与刻板印象,进而可能加剧偏见的传播扩散。本文以美国劳工统计局及美国联邦调查局统一犯罪报告(UCR)项目发布的「真实标签(ground truth)」数据为基础,针对主流大语言模型在犯罪与就业领域内的性别及种族偏见构建基准测量框架。随后本文提出一种基于事实的提示方法(称为「事实少样本提示(fact-shot prompting)」),用于缓解大语言模型生成回复中的偏见问题。事实少样本提示会向输入至大语言模型的提示词中补充事实性信息,该方法同时可用于评估当前大语言模型训练的有效性。当该方法应用于不同职业与犯罪场景时,可观测到涉及种族与性别的预测模式。这种针对大语言模型内部偏见的缓解方法,可为训练失效的具体领域及校正方案提供关键见解。
提供机构:
IEEE DataPort
创建时间:
2024-05-19



