安全生产知识能力评估训练数据

Name: 安全生产知识能力评估训练数据
Creator: 杭州叙简科技股份有限公司
Published: 2025-06-23 10:40:43
License: 暂无描述

浙江省数据知识产权登记平台2025-06-23 更新2025-06-24 收录

下载链接：

https://www.zjip.org.cn/home/announce/trends/140407

下载链接

链接失效反馈

官方服务：

资源简介：

通过对安全生产领域知识测评数据，进行上下文标注，形成具有高度针对性和语义复杂性的训练集。这些数据通过数据解析和安全合规验证，从而生成安全生产领域的专业样本，为全面评估AI大模型的安全生产知识提供了专业全面的测试样例，在安全生产问答题解答中的语义理解能力、语言表达能力和思维推导能力评估，以及测试和提升安全生产领域AI模型对场景理解的适用性。1.数据采集：采集《中华人民共和国安全生产法》《中华人民共和国消防法》《安全事故隐患排查治理暂行规定》《固定式压力容器安全技术监察规程》《起重机械定期检验规则》等安全生产相关法律法规、规章制度涉及的安全生产领域公开测试、考试题目，得到待分析原始数据的安全生产类文献题目数据集。 2.数据处理：1）采用文本标注，标注题目所属安全类型；2）采用TextRank提取摘要的方式提取每个段落的一个关键句，将关键句按照段落的顺序排列，组成新的文本内容；对文本内容提取出的关键句序列再进行一轮关键句提取，根据迭代传播权重计算各个句子的得分，再将每个句子输入序列标注模型，得到实体序列标注结果，包含实体越多的句子给予越高的重要度权重倾斜，实体权重得分和句子重要度得分之和即作为每个句子最终的重要度分数。每次设置一个范围在[1,4]的整型随机数r，提取排名前r的关键句作为该题的正确候选答案(即在TextRank模型中，T＝r)，将正确候选答案分类存储；3）根据该题目的安全类型与正确候选答案的实体类型，在相同安全类型与实体类型的答案集中随机选择字符长度与原正确候选答案最为接近的3个答案成为该题目的错误候选答案，若答案集中符合要求的错误候选答案数量不足，则采用NLTK生成该答案的反义词作为错误候选答案的补充。 3.数据应用：该数据集可用来测试和提升安全生产领域AI模型对场景理解的适用性。

Contextual annotation is performed on knowledge evaluation data in the work safety field to form a training set with high targeting and semantic complexity. These data undergo data parsing and safety compliance verification to generate professional samples for the work safety field, providing professional and comprehensive test examples for comprehensively evaluating the work safety knowledge of AI Large Language Models (LLMs). This dataset supports the evaluation of semantic understanding ability, language expression ability and thinking and reasoning ability in answering work safety questions, as well as the testing and improvement of the applicability of AI models in the work safety field for scene understanding. 1. Data Collection: Collect public test and examination questions related to work safety from work safety-related laws, regulations and rules including the Law of the People's Republic of China on Work Safety, Fire Protection Law of the People's Republic of China, Interim Provisions on the Investigation and Governance of Safety Hazards, Safety and Technical Supervision Regulations for Fixed Pressure Vessels, Regular Inspection Rules for Lifting Machinery, etc., to obtain the dataset of work safety-related literature questions as raw data to be analyzed. 2. Data Processing: 1) Perform text annotation to mark the safety category that each question belongs to; 2) Use the TextRank-based abstract extraction method to extract one key sentence from each paragraph, and arrange the key sentences in the order of the paragraphs to form new text content. Then conduct another round of key sentence extraction on the extracted key sentence sequence of the text content, calculate the score of each sentence based on iterative propagation weights, and input each sentence into the sequence labeling model to obtain the entity sequence labeling results. The more entities a sentence contains, the higher importance weight it will be assigned. The sum of the entity weight score and the sentence importance score is taken as the final importance score of each sentence. Each time, set an integer random number r ranging from [1, 4], extract the top r ranked key sentences as the correct candidate answers of this question (i.e., T = r in the TextRank model), and store the correct candidate answers in classified categories; 3) According to the safety category of the question and the entity types of the correct candidate answers, randomly select 3 answers with the closest character length to the original correct candidate answers from the answer set with the same safety category and entity types as the incorrect candidate answers of this question. If the number of eligible incorrect candidate answers in the answer set is insufficient, use NLTK (Natural Language Toolkit) to generate antonyms of the answer as supplements to the incorrect candidate answers. 3. Data Application: This dataset can be used to test and improve the applicability of AI models in the work safety field for scene understanding.

提供机构：

杭州叙简科技股份有限公司

创建时间：

2025-04-30

搜集汇总

数据集介绍