"Datasets for Open-Domain Knowledge Graph Construction in Low-Resource Domains"
收藏DataCite Commons2026-03-12 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/datasets-open-domain-knowledge-graph-construction-low-resource-domains
下载链接
链接失效反馈官方服务:
资源简介:
"Large Language Models (LLMs) frequently suffer from Factual Errors when processing Low-Resource and High-Ambiguity (LRHA) corpora, such as legal documents, medical records, and historical texts, due to domain knowledge scarcity and semantic complexity. To address this, we propose a universal framework for automated Open-Domain Knowledge Graph (OKG) construction integrating Reasoning Distillation, Fine-grained Rules, and reinforcement learning. Using Chinese historical texts as a testbed, we utilize the expert model DeepSeek-R1 to generate high-quality Reasoning Trajectories for Cold-start training, effectively overcoming data scarcity. Subsequently, we design a Custom Reward Model based on Fine-grained Rules to conduct Alignment Training on the target model via Generalized Reward Penalized Optimization (GRPO) and Direct Preference Optimization (DPO) algorithms. Experimental results demonstrate that the optimized DeepSeek-R1-Distill-Qwen-14B model achieves a maximum F1 score of 0.83. Validation through the construction of the \u201cFirst Four Histories\u201d OKG and Graph Retrieval-Augmented Generation (GraphRAG) experiments confirms the framework\u2019s efficacy. Crucially, this study reveals that in LRHA domains, a rule-based Custom Reward Model outperforms general LLMs in alignment efficiency and economic feasibility, offering a transferable, low-cost solution for Low-Resource Texts in fields like law, medicine, and endangered languages."
提供机构:
IEEE DataPort
创建时间:
2026-03-12



