PhysicalAI-SpatialIntelligence-Lyra-SDG
收藏魔搭社区2025-12-04 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/nv-community/PhysicalAI-SpatialIntelligence-Lyra-SDG
下载链接
链接失效反馈官方服务:
资源简介:
# Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
**[Paper](https://arxiv.org/abs/2509.19296), [Project Page](https://research.nvidia.com/labs/toronto-ai/lyra/)**
[Sherwin Bahmani](https://sherwinbahmani.github.io/),
[Tianchang Shen](https://www.cs.toronto.edu/~shenti11/),
[Jiawei Ren](https://jiawei-ren.github.io/),
[Jiahui Huang](https://huangjh-pub.github.io/),
[Yifeng Jiang](https://cs.stanford.edu/~yifengj/),
[Haithem Turki](https://haithemturki.com/),
[Andrea Tagliasacchi](https://theialab.ca/),
[David B. Lindell](https://davidlindell.com/),
[Zan Gojcic](https://zgojcic.github.io/),
[Sanja Fidler](https://www.cs.utoronto.ca/~fidler/),
[Huan Ling](https://www.cs.toronto.edu/~linghuan/),
[Jun Gao](https://www.cs.toronto.edu/~jungao/),
[Xuanchi Ren](https://xuanchiren.com/) <br>
## Dataset Description:
The PhysicalAI-SpatialIntelligence-Lyra-SDG Dataset is a multi-view 3D and 4D dataset generated using [GEN3C](https://github.com/nv-tlabs/GEN3C).
The 3D reconstruction setup uses 59,031 images, while the 4D setup has 7,378 videos. All the data are from diverse text prompts, spanning various scenarios such as indoor and outdoor environments, humans, animals, and both realistic and imaginative content. We synthesize 6 camera trajectories for each image (3D) or video (4D), yielding 354,186 videos for the 3D and 44,268 videos for the 4D.
It contains videos in RGB and camera poses and depth of the videos.
This dataset is ready for commercial use.
## Dataset Owner(s):
NVIDIA Corporation
## Dataset Creation Date:
2025/09/23
## License/Terms of Use:
[Visit the NVIDIA Legal Release Process](https://nvidia.sharepoint.com/sites/ProductLegalSupport) for instructions on getting legal support for a license selection:
https://docs.google.com/spreadsheets/d/1e1K8nsMV9feowjmgXhdfa0qo-oGJNlnsBc1Qhwck7vU/edit?usp=sharing
## Intended Usage:
Researchers and academics working in spatial intelligence problems can use it to train AI models for multi-view video generation or reconstruction.
## Dataset Characterization:
** Data Collection Method<br>
[Synthetic]
** Labeling Method<br>
[Synthetic]
## Dataset Format:
RGB in mp4, Camera pose in .npz, Depth in zip format
## Dataset Quantification:
The 3D reconstruction setup has 59,031 multi-view examples, while the 4D setup has 7,378 multi-view examples. For each multi-view example, we have 6 views.
For each view, we have videos in Red, Green, Blue (RGB) and camera poses and depth of the videos.
| Field | Format |
|-------------|--------|
| Video | mp4 |
| Camera pose | .npz |
| Depth | .zip |
Storage: 25TB
## Reference(s):
Please refer to https://github.com/nv-tlabs/lyra for how to use this dataset.
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
## Citation
```
@inproceedings{bahmani2025lyra,
title={Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation},
author={Bahmani, Sherwin and Shen, Tianchang and Ren, Jiawei and Huang, Jiahui and Jiang, Yifeng and
Turki, Haithem and Tagliasacchi, Andrea and Lindell, David B. and Gojcic, Zan and Fidler, Sanja and
Ling, Huan and Gao, Jun and Ren, Xuanchi},
booktitle={arXiv preprint arXiv:2509.19296},
year={2025}
}
```
```
@inproceedings{ren2025gen3c,
title={GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control},
author={Ren, Xuanchi and Shen, Tianchang and Huang, Jiahui and Ling, Huan and
Lu, Yifan and Nimier-David, Merlin and Müller, Thomas and Keller, Alexander and
Fidler, Sanja and Gao, Jun},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}
```
# Lyra:基于视频扩散模型(Video Diffusion Model)自蒸馏的生成式三维场景重建
**[论文](https://arxiv.org/abs/2509.19296), [项目主页](https://research.nvidia.com/labs/toronto-ai/lyra/)**
[Sherwin Bahmani](https://sherwinbahmani.github.io/),
[Tianchang Shen](https://www.cs.toronto.edu/~shenti11/),
[Jiawei Ren](https://jiawei-ren.github.io/),
[Jiahui Huang](https://huangjh-pub.github.io/),
[Yifeng Jiang](https://cs.stanford.edu/~yifengj/),
[Haithem Turki](https://haithemturki.com/),
[Andrea Tagliasacchi](https://theialab.ca/),
[David B. Lindell](https://davidlindell.com/),
[Zan Gojcic](https://zgojcic.github.io/),
[Sanja Fidler](https://www.cs.utoronto.ca/~fidler/),
[Huan Ling](https://www.cs.toronto.edu/~linghuan/),
[Jun Gao](https://www.cs.toronto.edu/~jungao/),
[Xuanchi Ren](https://xuanchiren.com/) <br>
## 数据集描述:
PhysicalAI-SpatialIntelligence-Lyra-SDG 数据集是通过[GEN3C](https://github.com/nv-tlabs/GEN3C)生成的多视图三维与四维数据集。三维重建任务集包含59031张图像,四维任务集则包含7378段视频。所有数据均源自多样化的文本提示,涵盖室内外场景、人物、动物,以及写实与虚构内容。我们为每张图像(三维任务)或每段视频(四维任务)合成6条相机轨迹,由此为三维任务生成354186段视频,四维任务生成44268段视频。该数据集包含RGB格式视频、相机位姿以及视频深度信息。
本数据集可商用。
## 数据集所有者:
英伟达公司(NVIDIA Corporation)
## 数据集创建日期:
2025/09/23
## 许可证/使用条款:
请访问[英伟达法律发布流程](https://nvidia.sharepoint.com/sites/ProductLegalSupport)了解许可证选择的法律支持指南:
https://docs.google.com/spreadsheets/d/1e1K8nsMV9feowjmgXhdfa0qo-oGJNlnsBc1Qhwck7vU/edit?usp=sharing
## 预期用途:
面向空间智能领域开展研究的学者与研究人员,可使用本数据集训练用于多视图视频生成或重建的AI模型。
## 数据集特征:
** 数据收集方式<br>
[合成]
** 标注方式<br>
[合成]
## 数据集格式:
RGB视频采用mp4格式,相机位姿存储为.npz文件,深度信息以zip格式封装。
## 数据集量化统计:
三维重建任务集包含59031个多视图样本,四维任务集则包含7378个多视图样本。每个多视图样本对应6个视角。每个视角下均包含RGB三色视频、相机位姿以及视频深度信息。
| 字段 | 格式 |
|-------------|--------|
| 视频 | mp4 |
| 相机位姿 | .npz |
| 深度 | .zip |
存储容量:25TB
## 参考资料:
请参阅https://github.com/nv-tlabs/lyra 了解该数据集的使用方法。
## 伦理考量:
英伟达(NVIDIA)认为可信AI是一项共同责任,我们已建立相关政策与实践规范,以支持各类AI应用的开发。开发者若按照服务条款下载或使用本数据集,应与内部模型团队协作,确保该模型符合相关行业与应用场景的要求,并防范未预见的产品误用。
请通过[此链接](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)报告安全漏洞或英伟达AI相关问题。
## 引用
@inproceedings{bahmani2025lyra,
title={Lyra: 基于视频扩散模型自蒸馏的生成式三维场景重建},
author={Bahmani, Sherwin and Shen, Tianchang and Ren, Jiawei and Huang, Jiahui and Jiang, Yifeng and
Turki, Haithem and Tagliasacchi, Andrea and Lindell, David B. and Gojcic, Zan and Fidler, Sanja and
Ling, Huan and Gao, Jun and Ren, Xuanchi},
booktitle={arXiv预印本 arXiv:2509.19296},
year={2025}
}
@inproceedings{ren2025gen3c,
title={GEN3C: 基于精确相机控制的三维感知全局一致视频生成},
author={Ren, Xuanchi and Shen, Tianchang and Huang, Jiahui and Ling, Huan and
Lu, Yifan and Nimier-David, Merlin and Müller, Thomas and Keller, Alexander and
Fidler, Sanja and Gao, Jun},
booktitle={IEEE/CVF计算机视觉与模式识别会议论文集},
year={2025}
}
提供机构:
maas
创建时间:
2025-09-24



