five

Understanding Gender Bias in Bangla Clinical Patient Narratives: Analyzing Fairness and Reasoning in LLM Judgments

收藏
DataCite Commons2026-04-06 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/drx6r8gzyf
下载链接
链接失效反馈
官方服务:
资源简介:
BiasMedNarrative-BN is a synthetic counterfactual dataset of Bangla clinical patient narratives designed to evaluate gender bias in large language models. It contains 1,050 narratives constructed from 525 paired clinical scenarios, with each pair consisting of one male and one female version. The dataset covers 22 symptom types grouped into four major clinical categories, with severity levels ranging from Mild to Critical. Data was sourced from publicly available platforms including Facebook, Reddit, Bangla healthcare websites, and newspapers to capture both informal and formal patient expressions. These real-world symptom descriptions were first structured into clinically coherent scenarios, after which an LLM was used to generate natural, patient-style narrative versions. All data underwent preprocessing to remove personal identifiers and ensure linguistic and clinical consistency. The generated narratives were further validated by clinical experts to ensure realism and strict counterfactual equivalence.

BiasMedNarrative-BN 是一款面向孟加拉语临床患者叙事的合成反事实数据集,旨在评估大语言模型(Large Language Model,LLM)中的性别偏见。该数据集包含1050条叙事文本,均源自525组配对临床场景,每一组场景均包含男性与女性患者两个版本。数据集涵盖22种症状类型,分为四大临床类别,严重程度跨度从轻度(Mild)到危重(Critical)。数据采集自公开可用平台,包括Facebook、Reddit、孟加拉语医疗网站与报纸,以覆盖正式与非正式的患者表述场景。首先将这些真实世界的症状描述整理为符合临床逻辑的场景,随后借助大语言模型生成自然的、符合患者口吻的叙事文本版本。所有数据均经过预处理,以移除个人身份标识,并确保语言与临床层面的一致性。生成的叙事文本还经过临床专家的进一步验证,以确保其真实性与严格的反事实等效性。
提供机构:
Mendeley Data
创建时间:
2026-04-06
二维码
社区交流群
二维码
科研交流群
商业服务