HannaAbiAkl/wordnet-semantic-primes

Name: HannaAbiAkl/wordnet-semantic-primes
Creator: HannaAbiAkl
Published: 2024-07-08 14:52:59
License: 暂无描述

Hugging Face2024-07-08 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/HannaAbiAkl/wordnet-semantic-primes

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是语义塔方法的核心，结合了向量化的知识图谱信息来增强检索与生成（RAG）管道。数据集通过从WordNet数据集中提取并构建语义塔来创建，语义塔包含了与名词、动词、副词和形容词相关的原始语义信息。此外，语义塔还包含了从Wikidata收集的信息，如标签、实例、子类、部分、代表和描述等。向量嵌入是使用General Text Embeddings (GTE)大模型生成的。

This dataset is at the core of our semantic towers methodology, combining vectorized knowledge graph information to augment a Retrieval-and-Generation (RAG) pipeline. The dataset is constructed by deriving and building the semantic tower from the WordNet dataset, which includes primitive semantic information related to nouns, verbs, adverbs, and adjectives. Additionally, the semantic tower encompasses information gathered from Wikidata, such as label, instance of, subclass of, part of, represents, and description. The vector embeddings are generated using the General Text Embeddings (GTE) large model.

提供机构：

HannaAbiAkl

原始信息汇总

WordNet Semantic Primes

数据集概述

该数据集是语义塔方法的核心，结合了向量化的知识图谱信息，以增强检索与生成（RAG）管道的功能。

数据集构建

数据集通过构建语义塔来实现，语义塔是一个与术语相关的原始语义信息的集合，涵盖了四种术语类型（名词、动词、副词、形容词）。这些术语类型是从原始的WordNet数据集中提取的。

语义塔包含从Wikidata收集的信息，具体包括：

标签
实例类型
子类
部分
代表
描述

这些信息构成了区分一个术语与另一个术语所需的最小知识子集。

嵌入生成

向量嵌入使用General Text Embeddings (GTE)大型模型生成。

5,000+

优质数据集

54 个

任务类型

进入经典数据集