HannaAbiAkl/geonames-semantic-primes

Name: HannaAbiAkl/geonames-semantic-primes
Creator: HannaAbiAkl
Published: 2024-07-08 14:52:10
License: 暂无描述

Hugging Face2024-07-08 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/HannaAbiAkl/geonames-semantic-primes

下载链接

链接失效反馈

官方服务：

资源简介：

我们提出了一个数据集，该数据集是我们语义塔方法的核心，结合了向量化的知识图谱信息来增强检索与生成（RAG）管道。数据集通过构建语义塔来构建，语义塔是与术语相关的原始语义信息的集合，涉及660个与地理位置相关的类别。这些位置本身被分类为9个更高级别的类别，例如H代表溪流、湖泊和海洋，R代表道路和铁路，这些类别源自原始的GeoNames数据集。语义塔包含从Wikidata收集的信息，具体包括：标签、实例、子类、部分、代表和描述。这些信息构成了区分一个术语与另一个术语所需的最小知识子集。嵌入生成使用了General Text Embeddings (GTE)大模型。

We propose a dataset at the core of our semantic towers methodology which combines vectorized knowledge graph information to augment a Retrieval-and-Generation (RAG) pipeline. The dataset is constructed by deriving and building the semantic tower - an ensemble of primitive semantic information related to a term - of 660 category classes related to geographical locations. These locations are themselves classified into 9 higher-level categories, e.g. H for stream, lake, and sea, and R for road and railroad and are derived from the original GeoNames dataset. The semantic tower encompasses information gathered from Wikidata, specifically: label, instance of, subclass of, part of, represents, description. This information forms the smallest subset of knowledge needed to distinguish a term from another. The vector embeddings are generated using the General Text Embeddings (GTE) large model.

提供机构：

HannaAbiAkl

原始信息汇总

GeoNames Semantic Primes

数据集概述

该数据集是语义塔方法的核心，结合了向量化的知识图谱信息，以增强检索与生成（RAG）管道的功能。

数据集构建

数据集通过构建语义塔来实现，语义塔是与地理相关术语的原始语义信息的集合。
包含660个与地理位置相关的类别，这些类别进一步分为9个更高层次的类别，例如H代表河流、湖泊和海洋，R代表道路和铁路。
数据来源于原始的GeoNames数据集。
语义塔涵盖的信息来自Wikidata，包括：
- 标签
- 实例类型
- 子类
- 部分
- 代表
- 描述

嵌入生成

向量嵌入使用General Text Embeddings (GTE)大型模型生成。

5,000+

优质数据集

54 个

任务类型

进入经典数据集