Grid2Text Dataset Aligning Meteorological Grid Data with Expert Reasoning Chains
收藏DataONE2025-12-17 更新2025-12-27 收录
下载链接:
https://search.dataone.org/view/sha256:f70b192a7161ccf5e3ea0717b6ee482866f9461dbdb730a5e86df609cd43ceaf
下载链接
链接失效反馈官方服务:
资源简介:
Grid2Text is a comprehensive alignment dataset designed to bridge the gap between meteorological grid data and expert textual forecasts[cite: 1, 9]. This dataset addresses the challenge of interpreting high-dimensional numerical weather prediction (NWP) data by providing one-to-one correspondences between structured grid features and expert-written weather reports, augmented with explicit Chain-of-Thought (CoT) reasoning paths. Data Content & Structure: Source Data: Derived from ERA5 Reanalysis Data (ECMWF), covering the Shanghai region from 2020 to 2022. Core Variables: Includes 10 core meteorological variables such as 10m wind components, total precipitation, temperature, and relative humidity. Reasoning Layer: Distinct from traditional datasets, Grid2Text includes a \"Chain-of-Thought\" (CoT) component that captures the intermediate reasoning steps of forecasters (e.g., wind vector analysis, temperature trend judgment). File Structure: The dataset is organized into three main directories: feature_data/: CSV files containing structured spatiotemporal aggregated features (e.g., max_temp_c, ifrain, wind direction). chain_of_thought/: TXT files containing the step-by-step expert reasoning process used to derive the forecast. forecasts/: TXT files containing the final, operational-standard natural language weather forecast. Methodology: The dataset was constructed using a \"Human-in-the-loop\" workflow, employing a hybrid strategy of Large Language Model (LLM) generation followed by rigorous multi-round verification by senior meteorological forecasters to ensure physical accuracy and logical consistency.
创建时间:
2025-12-20



