AI chatbot evaluation scale and rubric.

Figshare2025-03-19 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/AI_chatbot_evaluation_scale_and_rubric_/28627141

下载链接

链接失效反馈

官方服务：

资源简介：

BackgroundGenerative artificial intelligence (GenAI) has the potential to revolutionise healthcare delivery. The nuances of real-life clinical practice and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective deployment of AI.MethodsWe present a protocol for the systematic evaluation of large language models (LLMs) as GenAI chatbots within the context of clinical microbiology and infectious diseases clinical consultations. We aim to critically assess recommendations produced by four leading GenAI models, including Claude 2, Gemini Pro, GPT-4.0, and a GPT-4.0-based custom AI chatbot.DiscussionA standardised, healthcare-specific, universal prompt template is developed to elicit clinically impactful AI responses. Generated responses will be graded by two panels of practicing clinicians, encompassing a wide spectrum of domain expertise in clinical microbiology and virology, as well as infectious diseases. Evaluations will be performed using a 5-point Likert scale across four clinical domains: factual consistency, comprehensiveness, coherence, and medical harmfulness. Our study will offer insights into the feasibility, limitations, and boundaries of GenAI in clinical consultations, providing guidance for future research and clinical implementation. Ethical guidelines and safety guardrails should be developed to uphold patient safety and clinical standards.

创建时间：

2025-03-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集