"ElecBench"
收藏DataCite Commons2025-06-19 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/elecbench-0
下载链接
链接失效反馈官方服务:
资源简介:
"In response to the challenges of grid stability, renewable energy integration, and electricity market dynamics, power grid dispatch is increasingly adopting large language models (LLMs) for their potential to improve efficiency and intelligence. However, the lack of specific performance benchmarks has limited their effective application. To fill this gap, we introduce \u201cElecBench\u201d, an evaluation benchmark designed for LLMs in power grid dispatch. ElecBench comprehensively covers sector-specific scenarios and deepens the testing of professional knowledge, focusing on six primary metrics: factuality, logicality, stability, security, fairness, and expressiveness, divided into 24 sub-metrics. This framework offers detailed insights into the capabilities and limitations of LLMs in power grid dispatch. We publicly release the test set and evaluate eight LLMs across various scenarios and metrics. ElecBench aims to become the standard benchmark for LLM applications in power grid dispatch, encouraging continuous updates and driving technological progress."
针对电网稳定性、可再生能源并网以及电力市场动态面临的多重挑战,电网调度领域正日益广泛应用大语言模型(Large Language Models),因其具备提升调度效率与智能化水平的潜力。然而,由于缺乏针对性的性能基准测试集,其实际有效应用受到了限制。为填补这一研究空白,我们推出了面向电网调度场景的大语言模型评测基准数据集"ElecBench"。该基准全面覆盖细分领域场景,深化了专业知识的测试维度,聚焦六大核心评测指标:事实准确性、逻辑严谨性、稳定性、安全性、公平性与表达能力,并进一步细分为24个子指标。本评测框架可全面剖析大语言模型在电网调度场景中的能力边界与局限性。我们公开了该测试集,并针对八款大语言模型在各类场景与指标下的表现开展了评测。ElecBench旨在成为电网调度领域大语言模型应用的标准评测基准,以期推动该领域的持续迭代与技术进步。
提供机构:
IEEE DataPort
创建时间:
2025-06-19



