"Anonymized Schema Metadata and Experimental Logs for Privacy-Preserving RAG in Semiconductor Manufacturing"
收藏DataCite Commons2025-12-29 更新2026-05-03 收录
下载链接:
https://ieee-dataport.org/documents/anonymized-schema-metadata-and-experimental-logs-privacy-preserving-rag-semiconductor
下载链接
链接失效反馈官方服务:
资源简介:
"This dataset is designed to support research on privacy-preserving schema retrieval for Retrieval-Augmented Generation (RAG) under air-gapped environments and strict data sovereignty constraints. It contains only schema-level metadata and relational topology information, without any row-level instance data.Specifically, the dataset comprises 52 relational tables with more than 1,300 columns and a set of 167 domain-specific natural language queries. Answering these queries typically requires multi-hop reasoning involving joins over 3\u20136 tables. To protect sensitive industrial information, all identifiers are anonymized using salted MD5 hashing, while foreign-key relationships and structural connectivity are preserved.This design enables the study of topology-aware retrieval and uncertainty-guided optimization in highly restricted industrial settings, without exposing proprietary semantics or violating confidentiality requirements. "
提供机构:
IEEE DataPort
创建时间:
2025-12-29



