five

"UniEval Dataset"

收藏
DataCite Commons2025-10-25 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/unieval-dataset-0
下载链接
链接失效反馈
官方服务:
资源简介:
"The rapid adoption of large language models (LLMs) in domains such as healthcare, finance, and customer service has exposed critical gaps in current evaluation frameworks, which are often vendor-specific, fragmented, and lack persistent conversational memory. To address these limitations, we propose UniEval AI, a model-agnostic framework for comprehensive monitoring, evaluation, and optimization of LLMs. UniEval AI employs a layered architecture comprising a Universal Model Interface that enables seamless evaluation across GPT, Qwen, and Claude. An evaluation engine integrates quality assessment, LIME- and SHAP-based explainability, and real-time bias detection and mitigation, as well as a hierarchical memory system that preserves conversational context. Enterprise-grade features, including audit logging, role-based access control, and CI\/CD integration, ensure compliance and secure deployment. Experimental results across diverse conversational datasets demonstrate that UniEval AI outperforms existing frameworks, achieving 90% confidence calibration accuracy (vs. 60\u201370% industry benchmarks), 91% bias detection accuracy with a 6% false positive rate, and a composite reliability score of 84.75. Human evaluation further confirmed the superior usefulness of UniEval AI explanations (4.2\/5.0 vs. 3.1\/5.0). Scalable to over 100 concurrent sessions, UniEval AI establishes a unified, transparent, and enterprise-ready evaluation system, paving the way for multilingual extensions, domain-specific adaptations, and federated assessment strategies.  "
提供机构:
IEEE DataPort
创建时间:
2025-10-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作