The Well
收藏The Well: 15TB of Physics Simulations
概述
The Well是一个大规模的机器学习数据集集合,包含多种时空物理系统的数值模拟数据。数据集涵盖生物系统、流体动力学、声波散射以及星系外流体或超新星爆炸的磁流体动力学模拟等多个领域。总数据量达到15TB,包含16个数据集。
数据集使用
安装
-
从PyPI安装:
pip install the_well -
从源码安装: bash git clone https://github.com/PolymathicAI/the_well cd the_well pip install .
-
安装基准依赖:
pip install the_well[benchmark]
数据下载
-
使用
the-well-download命令下载数据集。 bash the-well-download --base-path path/to/base --dataset active_matter --split train -
若省略
--dataset和--split,将下载所有数据集和分割。
数据流
-
数据集也可通过Hugging Face进行流式访问。 python from the_well.data import WellDataset from torch.utils.data import DataLoader
trainset = WellDataset( well_base_path="hf://datasets/polymathic-ai/", well_dataset_name="active_matter", well_split_name="train", ) train_loader = DataLoader(trainset)
基准测试
- 提供了基准测试脚本,用于在不同数据集上评估代理模型。 bash cd the_well/benchmark python train.py experiment=fno server=local data=active_matter
引用
@inproceedings{ohana2024thewell, title={The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning}, author={Ruben Ohana and Michael McCabe and Lucas Thibaut Meyer and Rudy Morel and Fruzsina Julia Agocs and Miguel Beneitez and Marsha Berger and Blakesley Burkhart and Stuart B. Dalziel and Drummond Buschman Fielding and Daniel Fortunato and Jared A. Goldberg and Keiya Hirashima and Yan-Fei Jiang and Rich Kerswell and Suryanarayana Maddu and Jonah M. Miller and Payel Mukhopadhyay and Stefan S. Nixon and Jeff Shen and Romain Watteaux and Bruno R{e}galdo-Saint Blancard and Fran{c{c}}ois Rozet and Liam Holden Parker and Miles Cranmer and Shirley Ho}, booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, year={2024}, url={https://openreview.net/forum?id=00Sx577BT3} }
联系
- 联系人: Ruben Ohana, Michael McCabe
- 邮箱: {rohana,mmccabe}@flatironinstitute.org
问题反馈
- 可通过GitHub Issues提交问题、请求功能或提问。




