Supporting data for Benchmarking Time Series Forecasting Models on Synthetic, Operationally-Augmented SQL Server Query Telemetry
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/supporting-data-benchmarking-time-series-forecasting-models-synthetic-operationally
下载链接
链接失效反馈官方服务:
资源简介:
sqlserver-querystore-timeseriesA Novel Simulation Framework for Time Series Analysis and Forecasting in SQL Server Query Store WorkloadsOverviewThis repository provides a comprehensive simulation environment and verification toolkit for generating and analyzing realistic synthetic time series data derived from SQL Server Query Store. The framework is designed to facilitate experimentation with forecasting, anomaly detection, and workload prediction on richly patterned, plan-diverse, and gap-containing query workloads.It establishes a reproducible foundation for proactive workload management in SQL Server, showcasing how Query Store time series data can be leveraged not just for diagnostics, but also for advanced predictive analytics and research.FeaturesSynthetic Workload SimulationRealistic Patterns: Simulates 20 days of hourly data for multiple queries and variants, with daily\/weekly seasonality, business-hour bursts, organic drift, and events such as deployments, plan regressions, and outliers.Plan Diversity: Models multiple query hashes (QueryName) and variants (QueryVariant), each with distinct behaviors and regression windows.Correlated Metrics: Generates CPU, Latency, and Logical Reads with realistic mathematical relationships.Controlled Missingness: Injects both random and clustered gaps, as well as correlated missing data, to mimic real-world telemetry and sensor failures.Gap Realism and Data QualityRandom and Clustered Gaps: Probabilities are tuned to provide a realistic mix of mostly complete data with rare, but present gaps\u2014allowing for meaningful TSA and anomaly research.Outage Simulation: Specific outage periods and random data loss help test algorithm robustness.Verification ScriptAutomated Data Health Checks: Verifies the generated dataset with summary statistics, null ratios, gap counts, diversity, and plan regression presence.Suitability for TSA: Ensures the data is appropriate for time series analysis, forecasting, and anomaly detection.Reproducibility and ExtensibilityDeterministic Randomness: All simulations are seeded for reproducibility.Parameterizable: Easy to tune number of queries, gap probabilities, anomaly injection, and more for different research scenarios.Extensible: Framework is designed to be a solid, modifiable starting point for further research into query performance, gap handling, and advanced TSA.Repository Contentsload_simulation.sqlSimulates a synthetic workload against SQL Server, producing time series data with plan drift, forced plans, events, and random\/clustered gaps. The workload covers multiple queries and variants, and is suitable for TSA research.load_verification.sqlVerifies the generated time series dataset, reporting on length, gap distribution, plan diversity, event coverage, and overall data health. Output is designed to be pasted directly into research reports.Section5_Data_Analysis_Notebook.ipynb(Recommended) Python notebook for downstream analysis, including dataset verification, exploratory analysis, and comparative forecasting with ARIMA, LSTM, Prophet, Random Forest, and XGBoost.load_simulation_master_document.mdDetailed documentation of the simulation framework, its logic, and research motivations.Getting Started1. PrerequisitesMicrosoft SQL Server 2019 or later (Query Store enabled)Sufficient disk space for the database filesPermissions to create\/modify tables and run scripts2. UsageA. Run the SimulationOpen load_simulation.sql in SQL Server Management Studio (SSMS) or Azure Data Studio.Adjust top-level parameters (days, gap probabilities, etc.) if desired.Execute the script to generate the synthetic workload with realistic gaps and plan diversity.B. Verify the DataAfter simulation, open load_verification.sql in the same database.Execute the script.Review the output for:Total rows and time coverageGap\/plan regression countsNull ratios and data healthSuitability for TSA and anomaly detectionC. Export for AnalysisExport the SimulatedQueryMetrics table as CSV (see script or use SSMS \u201cSave Results As\u2026\u201d).Use the provided Python notebook for downstream analysis and model benchmarking.
提供机构:
Amarpreet Bassan



