proxectonos/corpus_dominio_museistico_patrimonio

Name: proxectonos/corpus_dominio_museistico_patrimonio
Creator: proxectonos
Published: 2026-04-24 11:04:07
License: 暂无描述

Hugging Face2026-04-24 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/proxectonos/corpus_dominio_museistico_patrimonio

下载链接

链接失效反馈

官方服务：

资源简介：

博物馆遗产语料库汇集了来自官方博物馆词库的专业术语资源，旨在描述、分类和记录文化遗产。该数据集代表了博物馆和遗产领域的技术和描述性记录，特别关注知识的组织概念和层次结构。语料库包含三个独立的西班牙语词库：文化财产词库、主题词库和技术词库，以及一个加利西亚语版本的文化财产词库。数据通过受控的网络爬取过程从西班牙文化遗产词库的网页中自动提取，并以JSONL格式存储，保留了词库的概念结构。该数据集适用于文化遗产领域的专业语言建模、术语分析、本体构建和其他自然语言处理任务。数据集采用CC BY 4.0许可，并感谢西班牙数字转型和公务员事务部以及欧盟NextGenerationEU的资助。

The Corpus museístico-patrimonio gathers specialized terminological resources from official museum thesauri, aimed at the description, classification, and documentation of cultural heritage. The dataset represents a technical and descriptive record specific to the museum and heritage field, with particular attention to the conceptual and hierarchical organization of knowledge. The corpus consists of three independent thesauri in Spanish: the Thesaurus of Cultural Goods, the Thesaurus of Subjects, and the Thesaurus of Techniques, along with a Galician version of the Thesaurus of Cultural Goods. The data was automatically extracted from the web pages of Spains Cultural Heritage Thesauri through controlled scraping processes and is stored in JSONL format, preserving the conceptual structure of the thesaurus. This dataset is intended for specialized language modeling in cultural heritage, terminological and lexicographical analysis, ontology and knowledge resource construction, terminology extraction and normalization experiments, NLP research applied to cultural heritage, and the development of terminological resources in Galician and Spanish. The dataset is licensed under CC BY 4.0 and acknowledges funding from the Spanish Ministry for Digital Transformation and the Civil Service and the EUs NextGenerationEU.

提供机构：

proxectonos

5,000+

优质数据集

54 个

任务类型

进入经典数据集