"UniEval Dataset"
收藏DataCite Commons2025-10-25 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/unieval-dataset-0
下载链接
链接失效反馈官方服务:
资源简介:
"The rapid adoption of large language models (LLMs) in domains such as healthcare, finance, and customer service has exposed critical gaps in current evaluation frameworks, which are often vendor-specific, fragmented, and lack persistent conversational memory. To address these limitations, we propose UniEval AI, a model-agnostic framework for comprehensive monitoring, evaluation, and optimization of LLMs. UniEval AI employs a layered architecture comprising a Universal Model Interface that enables seamless evaluation across GPT, Qwen, and Claude. An evaluation engine integrates quality assessment, LIME- and SHAP-based explainability, and real-time bias detection and mitigation, as well as a hierarchical memory system that preserves conversational context. Enterprise-grade features, including audit logging, role-based access control, and CI\/CD integration, ensure compliance and secure deployment. Experimental results across diverse conversational datasets demonstrate that UniEval AI outperforms existing frameworks, achieving 90% confidence calibration accuracy (vs. 60\u201370% industry benchmarks), 91% bias detection accuracy with a 6% false positive rate, and a composite reliability score of 84.75. Human evaluation further confirmed the superior usefulness of UniEval AI explanations (4.2\/5.0 vs. 3.1\/5.0). Scalable to over 100 concurrent sessions, UniEval AI establishes a unified, transparent, and enterprise-ready evaluation system, paving the way for multilingual extensions, domain-specific adaptations, and federated assessment strategies. "
提供机构:
IEEE DataPort
创建时间:
2025-10-25



