five

Ved100/PreGen-NavierStokes-2D

收藏
Hugging Face2025-11-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Ved100/PreGen-NavierStokes-2D
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - other pretty_name: PreGen Navier-Stokes 2D Dataset size_categories: - 100K<n<1M tags: - physics - fluid-dynamics - navier-stokes - pde - scientific-computing - neural-operators - foundation-models - difficulty-transfer - reynolds-number - openfoam --- # PreGen Navier-Stokes 2D Dataset ## Dataset Description This dataset accompanies the research paper **"Pre-Generating Multi-Difficulty PDE Data For Few-Shot Neural PDE Solvers"** (under review at ICLR 2026). It contains systematically generated 2D incompressible Navier-Stokes fluid flow simulations designed to study **difficulty transfer** in neural PDE solvers. The key insight: by pre-generating many low and medium difficulty examples and including them with a small number of hard examples, neural PDE solvers can learn high-difficulty physics from far fewer samples. This dataset enables **8.9× reduction in compute time** while achieving comparable performance. ### Dataset Summary - **Total Size:** ~421 GB - **Format:** NumPy arrays (.npy files) - **Number of Files:** 9 - **Simulations per file:** 6,400 trajectories - **Timesteps:** 20 per trajectory - **Spatial Resolution:** 128 × 128 grid - **Solver:** OpenFOAM (icoFoam) - **Domain:** 2D Incompressible Navier-Stokes equations ### Problem Setting The dataset solves the 2D incompressible Navier-Stokes equations: ``` ∂u/∂t + (u · ∇)u + ∇p = ν∆u ∇ · u = 0 ``` where: - `u(x,t)` is the velocity field - `p(x,t)` is the kinematic pressure - `ν` is the kinematic viscosity (1.5 × 10⁻⁵ m²/s) - Domain: Ω ⊂ [0,1]² ## Difficulty Axes The dataset systematically varies complexity along three axes: ### 1. **Geometry Axis** (Number of Obstacles) Simulations in flow-past-object (FPO) configuration with varying obstacle complexity: - **Easy:** No obstacles (open channel flow) - **Medium:** Single square obstacle - **Hard:** 2-10 randomly placed square obstacles **Files:** - `Geometry_Axis/FPO_Geometry_Easy_NoObstacle.npy` (47 GB) - `Geometry_Axis/FPO_Geometry_Medium_SingleObstacle.npy` (47 GB) - `Geometry_Axis/FPO_Geometry_Hard_MultiObstacle.npy` (47 GB) ### 2. **Physics Axis** (Reynolds Number) Simulations with varying flow complexity via Reynolds number: **Multi-Obstacle Flows:** - **Easy:** Re ∈ [100, 1000] - laminar regime - **Medium:** Re ∈ [2000, 4000] - transitional regime - **Hard:** Re ∈ [8000, 10000] - turbulent regime **Files:** - `Physics_Axis/MultiObstacle/FPO_Physics_MultiObstacle_Easy_Re100-1000.npy` (47 GB) - `Physics_Axis/MultiObstacle/FPO_Physics_MultiObstacle_Medium_Re2000-4000.npy` (47 GB) - `Physics_Axis/MultiObstacle/FPO_Physics_MultiObstacle_Hard_Re8000-10000.npy` (47 GB) **No-Obstacle Flows:** - `Physics_Axis/NoObstacle/FPO_Physics_NoObstacle_Easy_Re100-1000.npy` (47 GB) ### 3. **Combined Axis** (Geometry + Physics) Combined variations in both geometry and Reynolds number: - **Easy:** No obstacles + low Re ([100, 1000]) - **Medium:** Single obstacle + medium Re ([2000, 4000]) - **Hard:** Multiple obstacles + high Re ([8000, 10000]) **File:** - `Combined_Axis/FPO_Combined_Medium_SingleObstacle_MedRe.npy` (47 GB) ### 4. **Special Configuration** - `Special/FPO_Cylinder_Hole_Location_6284.npy` (47 GB) - Cylinder with hole at specific location ## Data Format Each `.npy` file contains a NumPy array with shape: `(6400, 20, 128, 128, 6)` **Dimensions:** - **6400**: Number of simulation trajectories - **20**: Timesteps per trajectory - **128 × 128**: Spatial grid resolution - **6**: Channels (features) **Channels (in order):** 1. **u** - Horizontal velocity component (m/s) 2. **v** - Vertical velocity component (m/s) 3. **p** - Kinematic pressure (m²/s²) 4. **Re_normalized** - Normalized Reynolds number 5. **Binary mask** - Geometry encoding (1 = obstacle, 0 = fluid) 6. **SDF** - Signed distance field to nearest obstacle boundary ## Simulation Details ### Boundary Conditions **Flow Past Object (FPO):** - **Left (inlet):** Parabolic velocity profile with peak velocity Umax - **Right (outlet):** Zero-gradient pressure outlet - **Top/Bottom:** No-slip walls (u = 0) - **Obstacles:** No-slip walls (u = 0) ### Reynolds Number Sampling Re is sampled from a truncated Gaussian distribution N(5000, 2000²) with support [100, 10000]. The inlet velocity is scaled to achieve the target Re: ``` Re = (U_avg × L) / ν U_avg = (2/3) × U_max ``` ### Time Integration - **Scheme:** Backward Euler (1st order implicit) - **Spatial discretization:** Finite volume method - **Gradient terms:** Gauss linear (central differencing) - **Convection:** Gauss linearUpwind with gradient reconstruction - **Diffusion:** Gauss linear orthogonal ### Simulation Duration Adaptive time scheduling based on Reynolds number to ensure flow development: - **Low Re (10-100):** Fixed 2700s - **Medium Re (100-1000):** 1-10× characteristic diffusion time - **High Re (1000-10000):** 10-40× characteristic diffusion time ### Computational Cost The harder the simulation, the more expensive to generate: | Configuration | Average Time (seconds) | |--------------|----------------------| | No obstacle, Low Re | 176.7 | | No obstacle, Medium Re | 261.1 | | No obstacle, High Re | 350.4 | | One obstacle, Low Re | 609.5 | | One obstacle, Medium Re | 731.1 | | One obstacle, High Re | 942.8 | | Multiple obstacles, Low Re | 1550.9 | | Multiple obstacles, Medium Re | 1599.2 | | Multiple obstacles, High Re | 1653.3 | ## Key Research Findings This dataset was specifically designed to study **difficulty transfer** in neural PDE solvers: 1. **Sample Efficiency**: Training on 10% hard data + 90% easy/medium data recovers ~96-98% of the performance of training on 100% hard data 2. **Compute Efficiency**: By mixing difficulties optimally, you can achieve the same error with **8.9× less compute** spent on data generation 3. **Medium > Easy**: For most budgets, generating fewer medium-difficulty examples outperforms generating more easy examples 4. **Foundation Dataset Potential**: Medium-difficulty data (single obstacle) improves few-shot performance on complex geometries (NURBS shapes from FlowBench) ## Usage ### Basic Loading ```python import numpy as np from huggingface_hub import hf_hub_download # Download a specific difficulty level file_path = hf_hub_download( repo_id="sage-lab/PreGen-NavierStokes-2D", filename="Geometry_Axis/FPO_Geometry_Easy_NoObstacle.npy", repo_type="dataset" ) # Load the data data = np.load(file_path) print(f"Data shape: {data.shape}") # (6400, 20, 128, 128, 6) # Extract individual trajectories trajectory_0 = data[0] # Shape: (20, 128, 128, 6) # Extract velocity and pressure u = trajectory_0[:, :, :, 0] # Horizontal velocity v = trajectory_0[:, :, :, 1] # Vertical velocity p = trajectory_0[:, :, :, 2] # Pressure mask = trajectory_0[:, :, :, 4] # Binary geometry mask sdf = trajectory_0[:, :, :, 5] # Signed distance field ``` ### Difficulty Mixing for Training ```python import numpy as np from huggingface_hub import hf_hub_download # Load different difficulty levels easy_data = np.load(hf_hub_download( repo_id="sage-lab/PreGen-NavierStokes-2D", filename="Geometry_Axis/FPO_Geometry_Easy_NoObstacle.npy", repo_type="dataset" )) medium_data = np.load(hf_hub_download( repo_id="sage-lab/PreGen-NavierStokes-2D", filename="Geometry_Axis/FPO_Geometry_Medium_SingleObstacle.npy", repo_type="dataset" )) hard_data = np.load(hf_hub_download( repo_id="sage-lab/PreGen-NavierStokes-2D", filename="Geometry_Axis/FPO_Geometry_Hard_MultiObstacle.npy", repo_type="dataset" )) # Recommended: Use 10% hard + 90% medium for cost-effective training n_hard = 80 n_medium = 720 train_data = np.concatenate([ hard_data[:n_hard], medium_data[:n_medium] ], axis=0) # Hold out 100 hard examples for testing test_data = hard_data[-100:] ``` ### Computing Metrics ```python def compute_nmae(y_true, y_pred): """ Compute normalized Mean Absolute Error (nMAE) as used in the paper. Args: y_true: Ground truth, shape (N, T, H, W, C) y_pred: Predictions, shape (N, T, H, W, C) Returns: nMAE: Normalized mean absolute error """ numerator = np.abs(y_true - y_pred).sum() denominator = np.abs(y_true).sum() return numerator / (denominator + 1e-10) ``` ## Tested Models The paper evaluates this dataset on: ### Supervised Neural Operators (trained from scratch) - **CNO** (Convolutional Neural Operator) - 18M parameters - **F-FNO** (Factorized Fourier Neural Operator) - 5-layer ### Foundation Models (fine-tuned) - **Poseidon-T** (Tiny) - 21M parameters - **Poseidon-B** (Base) - 158M parameters - **Poseidon-L** (Large) - 629M parameters All models are trained autoregressively with one-step-ahead prediction (t → t+1) using relative L1 loss. ## Citation If you use this dataset, please cite: ```bibtex @inproceedings{pregen2026, title={Pre-Generating Multi-Difficulty {PDE} Data For Few-Shot Neural {PDE} Solvers}, author={Anonymous}, booktitle={Under review at International Conference on Learning Representations (ICLR)}, year={2026}, url={https://openreview.net} } ``` **Note:** Citation will be updated once the paper is published. ## Related Datasets - **The Well** - Large-scale multi-physics PDE dataset - **PDEBench** - Benchmark for scientific machine learning - **FlowBench** - Flow simulation over complex geometries (NURBS shapes) ## License MIT License ## Acknowledgments This dataset was generated using: - **OpenFOAM** (v2406) for CFD simulations - Simulations performed on computational clusters - Total compute time: Several thousand GPU/CPU hours ## Contact For questions or issues: - Open an issue in the dataset repository - Contact the sage-lab organization on Hugging Face - See the paper for additional contact information (once published) ## Dataset Maintainers sage-lab organization --- **Dataset Version:** 1.0 **Last Updated:** 2024 **Status:** Research dataset under peer review
提供机构:
Ved100
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作