维基百科科技文献专有名词术语词条数据集
收藏国家基础学科公共科学数据中心2025-12-06 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=6931b012195d2658bc1e5f83&type=1
下载链接
链接失效反馈官方服务:
资源简介:
数据来源于开源的维基百科数据源,使用wikipedia官方提供的数据下载接口与地址获取全部词条数据,再将词条主题的范围限定为科技文献相关,编写规则验证同名词条的正确性并进行清洗、过滤和存储。共2万条科技文献词条数据,主要服务于科技文献领域的知识增强与术语理解任务,能够为科学文献分析模型提供可靠的外部知识背景,提升模型在概念解释、专业术语理解和跨文献知识对齐方面的表现。
The dataset is sourced from open-source Wikipedia resources. All Wikipedia article entries are obtained through the official download APIs and URLs provided by Wikipedia. Subsequently, the topic scope of the entries is limited to scientific literature-related categories. Custom rules are formulated to verify the correctness of homonymous entries, followed by data cleaning, filtering and storage. This dataset contains a total of 20,000 scientific literature-related Wikipedia article entries. It mainly serves knowledge enhancement and terminology understanding tasks in the scientific literature domain, providing reliable external knowledge background for scientific literature analysis models, and improving the models' performance in concept explanation, professional terminology comprehension and cross-document knowledge alignment.
提供机构:
北京航空航天大学



