The codes and data for "Geospatial Knowledge Cube: Enhancing Semantic Query in Geospatial Data Cubes with Ontology and LLMs"
收藏Figshare2026-02-08 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/The_codes_and_data_for_Geospatial_Knowledge_Cube_Enhancing_Semantic_Query_in_Geospatial_Data_Cubes_with_Ontology_and_LLMs_/29399987
下载链接
链接失效反馈官方服务:
资源简介:
Geospatial Knowledge Cube: Enhancing Semantic Query in Geospatial Data Cubes with Ontology and LLMsProject OverviewGeospatial Knowledge Cube is an extension of GeoCube, enhanced with ontology-based semantics to enable interaction with Large Language Models (LLMs) for semantic querying.The project directory is structured as follows:├── ontology/│ ├── gkc_ontology_benchmark.ttl│ └── gkc_ontology_error_analysis.ttl├── mapping/│ ├── gkc_mapping_benchmark.obda│ └── gkc_mapping_error_analysis.obda├── database/│ ├── database.dump│ └── goldenSQL/│ ├── Q1.sql│ ├── Q2.sql│ └── ...├── codes/│ ├── experiment.py│ ├── prompt_template.py│ ├── question/│ ├── generate_benchmark_query/│ └── expanded_error_analysis_query/└── README.mdEnvironment and dependenciesJava Development Kit (JDK) 11Python 3.10+PostgreSQLPostGIS extension for PostgreSQLOntop-Protégé for ontology-based OBDA mappingTwo experiment configurationsThis repository contains two ontology + mapping configurations.A) Benchmark configuration (main evaluation)Ontology: ontology/gkc_ontology_benchmark.ttlMapping: mapping/gkc_mapping_benchmark.obdaQuestions: codes/question/benchmark_question.csv (81 questions)Outputs: codes/generate_benchmark_query/Used for: all benchmark results except Tables 3 & 4.B) Error-analysis configuration (fine-grained diagnosis)Ontology: ontology/gkc_ontology_error_analysis.ttlMapping: mapping/gkc_mapping_error_analysis.obdaQuestions: codes/question/error_analysis_question.csv (10 questions)Outputs: codes/expanded_error_analysis_query/Used for: Table 4 only.Step 1: Data ImportThe file database/database.dump was generated using pg_dump and contains a full PostgreSQL database backup, including schema and data.Create database: createdb geocube_dbRestore dump: pg_restore -U your_username -d geocube_db database/database.dumpEnable extensions: CREATE EXTENSION postgis; CREATE EXTENSION postgis_raster;Step 2: Semantic Layer ConstructionOpen Protégé and load the ontology from the ontology/ folder.Use the Ontop Plugin in Protégé to create or load OBDA mappings that link ontology classes/properties to the underlying PostgreSQL schema.Load the corresponding .obda mapping file from the mapping/ folder.Configure the database connection and test queries directly in Protégé.Alternatively, you can use Ontop CLI to perform the same task by:configuring ontop.obda and .properties filesrunning the Ontop CLI reasoner for SPARQL-to-SQL translationStep 3: Question DesignIn the paper, we classify data querying tasks along two dimensions — schema complexity and question complexity — forming a 3×3 grid. Each cell contains 9 questions, totaling 81 questions.All questions are documented in codes/question/benchmark_question.csv, where each entry includes:Natural language questionGolden SQL query (you can find it in the database/goldenSQL directory)Step 4: Experiment ScriptAll experiments are implemented in a single Python script under codes/experiment.py. Please run the python script:python {python_project_root}/codes/experiment.pyKey components:chat() function is the main entry point.Experiments corresponding to Figures 6–8 in the paper are located in:gpt-3.5-turbo experimentsgpt-4o experimentsBefore running the script:Insert your OpenAI API key in the appropriate location.(Optional) Update database connection parameters if using a local PostgreSQL instance.(Optional) Update Ontop CLI connection settings if running SPARQL experiments.The script supports temperature parameter tuning to evaluate generation robustness under different conditions.Step 5: Data Logging and AnalysisDuring experiments, the following metrics are collected:Overall Accuracy: Results corresponding to Tables 2 in the paper.Error Analysis: Results corresponding to Tables 4 in the paper.Stability Across Multiple Runs: Insights visualized in Figure 7.Token Usage and Cost: Analysis presented in Figure 8.Note: These metrics are logged during execution. No automatic aggregation script is included. Aggregation was performed manually during the study; the raw logs in codes/ can be used to reproduce the reported numbers.
创建时间:
2026-02-08



