five

"Datasets for Open-Domain Knowledge Graph Construction in Low-Resource Domains"

收藏
DataCite Commons2026-03-12 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/datasets-open-domain-knowledge-graph-construction-low-resource-domains
下载链接
链接失效反馈
官方服务:
资源简介:
"Large Language Models (LLMs) frequently suffer from Factual Errors when processing Low-Resource and High-Ambiguity (LRHA) corpora, such as legal documents, medical records, and historical texts, due to domain knowledge scarcity and semantic complexity. To address this, we propose a universal framework for automated Open-Domain Knowledge Graph (OKG) construction integrating Reasoning Distillation, Fine-grained Rules, and reinforcement learning. Using Chinese historical texts as a testbed, we utilize the expert model DeepSeek-R1 to generate high-quality Reasoning Trajectories for Cold-start training, effectively overcoming data scarcity. Subsequently, we design a Custom Reward Model based on Fine-grained Rules to conduct Alignment Training on the target model via Generalized Reward Penalized Optimization (GRPO) and Direct Preference Optimization (DPO) algorithms. Experimental results demonstrate that the optimized DeepSeek-R1-Distill-Qwen-14B model achieves a maximum F1 score of 0.83. Validation through the construction of the \u201cFirst Four Histories\u201d OKG and Graph Retrieval-Augmented Generation (GraphRAG) experiments confirms the framework\u2019s efficacy. Crucially, this study reveals that in LRHA domains, a rule-based Custom Reward Model outperforms general LLMs in alignment efficiency and economic feasibility, offering a transferable, low-cost solution for Low-Resource Texts in fields like law, medicine, and endangered languages."
提供机构:
IEEE DataPort
创建时间:
2026-03-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作