five

JKinco 筑衍 · 工程监理大模型测评集

收藏
魔搭社区2026-05-23 更新2026-01-24 收录
下载链接:
https://modelscope.cn/datasets/DongZekai/Norma_MESBench_1.0
下载链接
链接失效反馈
官方服务:
资源简介:
JKinco-MESBench 是首个面向中国建设工程监理行业的万题级大语言模型多任务测评基准。该基准聚焦房屋建设领域,包含10,144道题目,覆盖专业技术、通用综合及特色场景三大分类,涵盖单选、多选、判断及问答四种题型。项目采用独创的“多层级混合评分机制”,旨在立体度量大模型在监理垂直领域的专业能力,帮助从业者评估模型回答的准确性与可靠性,填补了监理行业大模型测评基准的空白。

Norma-MESBench 1.0 is the first ten-thousand-question-level multi-task evaluation benchmark for large language models (LLMs) targeting China's construction engineering supervision industry. This benchmark focuses on the housing construction sector, containing 10,144 questions spanning three categories: professional technology, general comprehensive, and characteristic scenarios, as well as four question types: single-choice, multiple-choice, true-false, and open-ended questions. It adopts an original "multi-level hybrid scoring mechanism", aiming to comprehensively measure the professional capabilities of LLMs in the vertical supervision field, help practitioners evaluate the accuracy and reliability of model responses, and fill the gap in LLMs evaluation benchmarks for the construction engineering supervision industry.
提供机构:
maas
创建时间:
2026-01-16
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是首个针对中国建筑工程监理行业的大规模、多任务大语言模型测评基准,包含10,144个问题,涵盖专业知识、通用知识和专业场景三大类别及四种题型。它采用多级混合评分机制,旨在全面评估大模型在工程监理领域的专业能力,填补了该领域评估基准的空白。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务