"Datasets for Open-Domain Knowledge Graph Construction in Low-Resource Domains"

Name: "Datasets for Open-Domain Knowledge Graph Construction in Low-Resource Domains"
Creator: IEEE DataPort
Published: 2026-03-12 09:17:35
License: 暂无描述

DataCite Commons2026-03-12 更新2026-05-03 收录

下载链接：

https://ieee-dataport.org/documents/datasets-open-domain-knowledge-graph-construction-low-resource-domains

下载链接

链接失效反馈

官方服务：

资源简介：

"Large Language Models (LLMs) frequently suffer from Factual Errors when processing Low-Resource and High-Ambiguity (LRHA) corpora, such as legal documents, medical records, and historical texts, due to domain knowledge scarcity and semantic complexity. To address this, we propose a universal framework for automated Open-Domain Knowledge Graph (OKG) construction integrating Reasoning Distillation, Fine-grained Rules, and reinforcement learning. Using Chinese historical texts as a testbed, we utilize the expert model DeepSeek-R1 to generate high-quality Reasoning Trajectories for Cold-start training, effectively overcoming data scarcity. Subsequently, we design a Custom Reward Model based on Fine-grained Rules to conduct Alignment Training on the target model via Generalized Reward Penalized Optimization (GRPO) and Direct Preference Optimization (DPO) algorithms. Experimental results demonstrate that the optimized DeepSeek-R1-Distill-Qwen-14B model achieves a maximum F1 score of 0.83. Validation through the construction of the \u201cFirst Four Histories\u201d OKG and Graph Retrieval-Augmented Generation (GraphRAG) experiments confirms the framework\u2019s efficacy. Crucially, this study reveals that in LRHA domains, a rule-based Custom Reward Model outperforms general LLMs in alignment efficiency and economic feasibility, offering a transferable, low-cost solution for Low-Resource Texts in fields like law, medicine, and endangered languages."

提供机构：

IEEE DataPort

创建时间：

2026-03-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集