Prompt Datasets to Evaluate LLM Safety
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/prompt-datasets-evaluate-llm-safety
下载链接
链接失效反馈官方服务:
资源简介:
The rise in Generative Artificial Intelligence technology through applications like ChatGPT has increased awareness about the presence of biases within machine learning models themselves. The data that Large Language Models (LLMs) are trained upon contain inherent biases as they reflect societal biases and stereotypes. This can lead to the further propagation of biases. In this paper, I establish a baseline measurement of the gender and racial bias within the domains of crime and employment across major LLMs using “ground truth” data published by the U.S. Bureau of Labor and the FBI’s UCR program. I then propose the novel approach of fact-based prompting (called “fact-shot prompting”) in mitigating bias in LLM-generated responses. Fact-shot prompting supplements factual information to a prompt that is given to an LLM. This approach also allows the observation of the effectiveness of current LLM training. I observe patterns in predictions involving race and gender when applied to various occupations and crimes. This approach to bias mitigation within LLMs themselves may provide key insights on specific areas of training failure and a plan of correction.
提供机构:
Thota, Hima



