five

Pthahnix/MeshLex-Data-Source

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Pthahnix/MeshLex-Data-Source
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: other license_name: mixed-source license_link: LICENSE task_categories: - text-to-3d - image-to-3d tags: - 3d - mesh - glb - geometry - objaverse - shapenet - abo - 3d-front - meshlex size_categories: - 100K<n<1M --- # MeshLex-Data-Source A large-scale collection of **158,588 geometry-only GLB meshes** (281 GB) from four major 3D datasets, unified under a single sharded directory structure. Built as the source data layer for the [MeshLex](https://github.com/Pthahnix/MeshLex-Research) research project, but broadly useful for any 3D mesh generation, reconstruction, or analysis research. ## Overview | | Files | Size | Categories | Median Faces | Median Vertices | |---|---:|---:|---:|---:|---:| | **ABO** | 7,952 | 6.4 GB | — | 18,239 | 10,990 | | **ShapeNet** | 52,472 | 35.9 GB | 55 | 7,037 | 6,586 | | **Objaverse** | 45,975 | 155.1 GB | 1,156 | 14,956 | 11,775 | | **3D-Front** | 52,189 | 84.1 GB | 19,121 | 44,347 | 54,227 | | **Total** | **158,588** | **281.5 GB** | **20,332** | **18,584** | **17,288** | All meshes are stored as **geometry-only GLB** files — materials, textures, and non-geometry metadata have been stripped. Each file contains only vertices and faces, loaded via [trimesh](https://trimesh.org/) with `force="mesh"`. ## Directory Structure ``` data-abo/ 00/ # shard 0: indices 0–9999 00000-of-07952.glb 00001-of-07952.glb ... data-shapenet/ 00/ # shard 0: indices 0–9999 01/ # shard 1: indices 10000–19999 ... 05/ # shard 5: indices 50000–52471 data-objaverse/ 00/ ... 04/ data-3d-front/ 00/ ... 05/ ``` **Naming convention:** `{index:05d}-of-{total:05d}.glb` **Sharding:** Files are split into subdirectories of up to 10,000 files each (`shard = index // 10000`) to stay within HuggingFace's per-directory file limit. **Local flat layout:** When downloaded, the original flat filenames follow the pattern `{source}-{index:05d}-of-{total:05d}.glb` (e.g., `shapenet-00123-of-52472.glb`). ## Data Sources ### Amazon Berkeley Objects (ABO) - **Origin:** [ABO Dataset](https://amazon-berkeley-objects.s3.amazonaws.com/index.html) — real product 3D models from Amazon catalog listings - **Processing:** Downloaded GLBs → geometry extraction via trimesh → degenerate mesh filtering (< 4 faces removed) - **License:** [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) - **Stats:** 7,952 meshes (1 failed conversion). Face count ranges from 20 to 11.5M (median 18K). ### ShapeNetCore v2 - **Origin:** [ShapeNet](https://shapenet.org/) — large-scale 3D model repository organized by WordNet synsets - **Processing:** OBJ models → trimesh load with `force="mesh"` → geometry-only GLB export - **License:** [ShapeNet Terms of Use](https://shapenet.org/terms) — research and educational purposes only - **Stats:** 52,472 meshes across 55 categories. Top categories: table (8,436), chair (6,778), airplane (4,045), car (3,514), sofa (3,173). ### Objaverse-LVIS - **Origin:** [Objaverse](https://objaverse.allenai.org/) — massive crowd-sourced 3D asset collection, filtered to the LVIS subset (objects with LVIS category annotations) - **Processing:** Downloaded via `objaverse` Python package → GLB conversion → geometry extraction → degenerate mesh filtering - **License:** Individual objects carry their own licenses; the majority are [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/). See the [Objaverse license page](https://objaverse.allenai.org/objaverse-1.0) for details. - **Stats:** 45,975 meshes across 1,156 LVIS categories. Top categories: chair (453), seashell (370), antenna (174), shield (146), snowman (145). ### 3D-FRONT - **Origin:** [3D-FRONT](https://tianchi.aliyun.com/specials/promotion/alibaba-3d-scene-dataset) — large-scale indoor scene dataset with professionally designed room layouts and furniture - **Processing:** Concatenated tar.gz parts → streaming extraction via `tarfile` → per-furniture model deduplication (UUID-based) → geometry-only GLB conversion - **License:** [3D-FRONT Terms of Use](https://tianchi.aliyun.com/specials/promotion/alibaba-3d-scene-dataset) — academic and research purposes only - **Stats:** 52,189 unique furniture models deduplicated from scene data, across 19,121 model categories. Top categories: Cabinet (5,041), Sofa (1,928), Lighting (1,795), Chair (1,357). ## Usage ### Quick Start ```python from huggingface_hub import hf_hub_download import trimesh # Download a single mesh path = hf_hub_download( repo_id="Pthahnix/MeshLex-Data-Source", filename="data-shapenet/00/00123-of-52472.glb", repo_type="dataset", ) mesh = trimesh.load(path, force="mesh") print(f"Vertices: {len(mesh.vertices)}, Faces: {len(mesh.faces)}") ``` ### Browse by Source ```python from huggingface_hub import HfApi api = HfApi() # List all files under a source directory files = api.list_repo_tree( "Pthahnix/MeshLex-Data-Source", path_in_repo="data-objaverse/00", repo_type="dataset", recursive=True, ) glb_files = [f.rfilename for f in files if f.rfilename.endswith(".glb")] print(f"Found {len(glb_files)} GLBs in shard 00") ``` ### Bulk Download ```python from huggingface_hub import snapshot_download # Download an entire source (e.g., ShapeNet — 35.9 GB) snapshot_download( repo_id="Pthahnix/MeshLex-Data-Source", repo_type="dataset", allow_patterns="data-shapenet/**", local_dir="./meshlex-data", ) ``` ### Load and Inspect ```python import trimesh from pathlib import Path data_dir = Path("./meshlex-data/data-shapenet") for glb in sorted(data_dir.rglob("*.glb"))[:5]: mesh = trimesh.load(str(glb), force="mesh") print(f"{glb.name}: {len(mesh.faces)} faces, {len(mesh.vertices)} vertices") ``` ## Mesh Statistics ### Face Count Distribution | Source | Min | Median | Mean | Max | |---|---:|---:|---:|---:| | ABO | 20 | 18,239 | 42,448 | 11,540,224 | | ShapeNet | 16 | 7,037 | 30,046 | 4,443,092 | | Objaverse | 4 | 14,956 | 153,404 | 20,818,039 | | 3D-Front | 4 | 44,347 | 59,642 | 3,361,058 | ### Vertex Count Distribution | Source | Min | Median | Mean | Max | |---|---:|---:|---:|---:| | ABO | 56 | 10,990 | 24,386 | 5,870,562 | | ShapeNet | 20 | 6,586 | 26,913 | 6,163,387 | | Objaverse | 8 | 11,775 | 127,680 | 15,398,448 | | 3D-Front | 6 | 54,227 | 74,556 | 5,206,898 | ### Category Breakdown (Top 10 across all sources) | Category | Source | Count | |---|---|---:| | table | ShapeNet | 8,436 | | chair | ShapeNet | 6,778 | | Cabinet | 3D-Front | 5,041 | | airplane | ShapeNet | 4,045 | | car | ShapeNet | 3,514 | | sofa | ShapeNet | 3,173 | | Sofa | 3D-Front | 1,928 | | Lighting | 3D-Front | 1,795 | | Others | 3D-Front | 1,726 | | Chair | 3D-Front | 1,357 | ## Processing Pipeline This dataset was produced by the MeshLex v5.1 pipeline: 1. **Download** raw 3D assets from each source (GLB, OBJ, or tar.gz) 2. **Load** via trimesh with `force="mesh"` to collapse scene graphs into single meshes 3. **Strip** materials, textures, normals, and UV coordinates — retain only vertices and faces 4. **Filter** degenerate meshes (< 4 faces) 5. **Deduplicate** (3D-Front only: UUID-based model deduplication across scenes) 6. **Export** as geometry-only GLB 7. **Upload** in sharded batches to HuggingFace (500 files per commit) ## Limitations - **Geometry only:** All material, texture, and color information has been removed. These meshes are not suitable for rendering without re-texturing. - **No decimation applied:** Meshes retain their original polygon counts, which vary widely (4 to 20M faces). Downstream pipelines should apply their own decimation strategy. - **Mixed quality:** Source datasets have varying levels of mesh quality. Some meshes may be non-manifold, have self-intersections, or contain disconnected components. - **Category coverage:** ABO meshes lack category labels in this release (marked as "unknown"). ## License This dataset aggregates meshes from multiple sources, each with its own license: | Source | License | Commercial Use | |---|---|---| | ABO | CC-BY 4.0 | Yes | | ShapeNet | ShapeNet Terms of Use | No (research only) | | Objaverse | Per-object (mostly CC-BY 4.0) | Varies | | 3D-Front | 3D-FRONT Terms of Use | No (research only) | **Important:** Due to ShapeNet and 3D-Front restrictions, this dataset as a whole should be treated as **research and educational use only**. If you need commercial-use data, filter to ABO and Objaverse subsets with compatible licenses. The processing pipeline code is licensed under [Apache 2.0](https://github.com/Pthahnix/MeshLex-Research/blob/main/LICENSE). ## Citation If you use this dataset in your research, please cite: ```bibtex @misc{meshlex-data-source-2026, title={MeshLex-Data-Source: A Unified Collection of Geometry-Only 3D Meshes}, author={Pthahnix}, year={2026}, howpublished={\url{https://huggingface.co/datasets/Pthahnix/MeshLex-Data-Source}}, } ``` Please also cite the original datasets: <details> <summary>Source dataset citations</summary> **ABO:** ```bibtex @inproceedings{collins2022abo, title={ABO: Dataset and Benchmarks for Real-World 3D Object Understanding}, author={Collins, Jasmine and Goel, Shubham and Deng, Kenan and Lutber, Achleshwar and Xu, Leon and Gundogdu, Erhan and Zhang, Xi and Vicente, Tomas F. Yago and Dideriksen, Thomas and Arber, Himanshu and Metez, Govind and Bikber, Matthew}, booktitle={CVPR}, year={2022} } ``` **ShapeNet:** ```bibtex @article{chang2015shapenet, title={ShapeNet: An Information-Rich 3D Model Repository}, author={Chang, Angel X. and Funkhouser, Thomas and Guibas, Leonidas and Hanrahan, Pat and Huang, Qixing and Li, Zimo and Savarese, Silvio and Savva, Manolis and Song, Shuran and Su, Hao and Xiao, Jianxiong and Yi, Li and Yu, Fisher}, journal={arXiv preprint arXiv:1512.03012}, year={2015} } ``` **Objaverse:** ```bibtex @inproceedings{deitke2023objaverse, title={Objaverse: A Universe of Annotated 3D Objects}, author={Deitke, Matt and Schwenk, Dustin and Salvador, Jordi and Weihs, Luca and Michel, Oscar and VanderBilt, Eli and Schmidt, Ludwig and Ehsani, Kiana and Kembhavi, Aniruddha and Farhadi, Ali}, booktitle={CVPR}, year={2023} } ``` **3D-FRONT:** ```bibtex @inproceedings{fu20213dfront, title={3D-FRONT: 3D Furnished Rooms with layOuts and fUrNiTure}, author={Fu, Huan and Cai, Bowen and Gao, Lin and Zhang, Ling-Xiao and Wang, Jiaming and Li, Cao and Zeng, Qixun and Sun, Chengyue and Jia, Rongfei and Zhao, Binqiang and Zhang, Hao}, booktitle={ICCV}, year={2021} } ``` </details> ## Related - **[MeshLex-Research](https://github.com/Pthahnix/MeshLex-Research)** — The research project that produced this dataset - **[MeshLex-Patches](https://huggingface.co/datasets/Pthahnix/MeshLex-Patches)** — Pre-segmented patch dataset derived from earlier Objaverse+ShapeNet processing
提供机构:
Pthahnix
搜集汇总
数据集介绍
main_image_url
构建方式
MeshLex-Data-Source数据集构建于一个大规模、多源融合的几何网格库之上,汇集了来自ABO、ShapeNet、Objaverse和3D-FRONT四大主流三维数据集的158,588个GLB格式网格模型,总计约281GB。构建过程严格遵循几何信息保留原则:首先通过trimesh库以`force="mesh"`模式加载原始资产,剔除材质、纹理、法线和UV坐标等非几何属性,仅保留顶点与面片数据。随后执行退化网格过滤,移除面数少于4的模型。对于3D-FRONT数据源,特别实施了基于UUID的去重策略,确保家具模型的唯一性。最终,所有文件按分片目录结构统一组织,每片不超过10,000个文件,以符合HuggingFace平台的文件数限制,并以标准化的索引编号进行命名。
特点
该数据集的核心特色在于其纯粹的几何专注性与跨源统一性。所有网格均为几何仅存(geometry-only)的GLB格式,不包含任何材质或纹理信息,专为几何形状分析、生成与重建任务设计。覆盖20,332个语义类别,网格面数跨度极大,从最低4个面到最高2,081万个面,中位面数约18,584,中位顶点数约17,288,展现了丰富的几何复杂度层次。数据集来源于不同领域,包括电商产品模型(ABO)、通用物体库(ShapeNet)、众包资产集(Objaverse)以及专业室内场景(3D-FRONT),直接保留了原始网格的质量多样性,包括非流形、自相交或含孤立组件的网格,为下游研究提供了真实而全面的挑战场景。
使用方法
使用MeshLex-Data-Source数据集极为便捷,主要通过HuggingFace Hub接口进行访问。用户可通过`hf_hub_download`函数按文件路径精确下载单个网格,如加载ShapeNet子集中的特定GLB文件,并使用trimesh解析。借助`HfApi.list_repo_tree`可遍历特定数据源的目录树,高效筛选文件。对于大规模使用,推荐通过`snapshot_download`并配合`allow_patterns`参数,下载整个数据源子集,例如ShapeNet的全部35.9GB数据,从而实现本地化批量加载。下载后的本地文件保持了源名称与索引,便于程序化遍历与处理。所有加载操作均需调用trimesh的`force="mesh"`模式以确保几何数据正确提取。
背景与挑战
背景概述
三维几何建模与理解是计算机视觉与图形学领域的核心议题,而大规模、高质量的三维网格数据集则是推动该领域发展的关键基石。MeshLex-Data-Source数据集由Pthahnix研究团队于2026年创建,旨在解决现有三维数据集来源分散、格式不一、几何信息与纹理信息混杂等问题。该数据集系统整合了Amazon Berkeley Objects、ShapeNetCore v2、Objaverse-LVIS与3D-FRONT四大权威来源,汇集158,588个仅保留顶点与面片信息的GLB格式网格模型。通过统一的预处理流程与碎片化存储结构,MeshLex-Data-Source为文本/图像到三维生成、网格重建及几何分析等研究提供了标准化、大规模的数据基础,显著降低了多源数据融合的使用门槛,有望推动三维内容生成技术迈向更高效、更鲁棒的发展阶段。
当前挑战
三维数据领域的核心挑战之一在于数据集之间存在的异构性与不一致性。不同来源的三维模型在格式、拓扑结构、质量及许可协议上差异显著,导致研究者难以直接进行跨数据集的训练与评估。MeshLex-Data-Source直面这一困境,通过统一的几何提取、降质过滤与去重流程,将OBJ、GLB、tar.gz等异构格式转化为纯几何GLB,并在保持原始多边形数的前提下剔除材质、纹理与法线等非几何信息。构建过程中,该数据集面临多重技术瓶颈:海量模型的高效下载与转换、非流行网格的鲁棒加载、超大规模文件在HuggingFace平台上的碎片化管理,以及不同授权协议下的合法二次分发。此外,原始网格质量参差不齐,部分模型存在非流形结构、自相交或碎片化组件,为下游应用带来了预处理与标准化方面的持续挑战。
常用场景
经典使用场景
在三维视觉与几何深度学习领域,MeshLex-Data-Source作为大规模纯几何网格数据集,最经典的使用场景是训练和评估基于拓扑结构的3D生成模型。研究者常利用其统一拆分的GLB格式,直接提取顶点与面片信息,用于构建网格自编码器、扩散模型或Transformer架构的生成式网络。由于数据源自ABO、ShapeNet、Objaverse与3D-Front四大主流库,涵盖了从精细家具到抽象物件的广泛类别,该数据集特别适用于跨域几何特征学习与无纹理形状的隐式表征研究。其条理清晰的分片结构使得批量加载与分布式训练极为便捷,成为几何建模与形状分析任务中不可或缺的基准资源。
实际应用
在实际应用层面,MeshLex-Data-Source为三维内容自动生成和数字资产创建提供了坚实基础。游戏开发、影视特效与虚拟现实产业中,手动建模耗时耗力,而基于此数据集训练的网格生成模型能够自动产出多样化的几何形状,极大加速资产生产流程。室内设计领域借助3D-Front的家具模型,可快速生成不同风格的房间布局。电子商务场景下,ABO中的真实产品模型支撑了商品三维展示和虚拟试摆功能。此外,该数据集还服务于机器人抓取规划中的物体几何表征学习,以及自动驾驶模拟器中障碍物模型的批量生成,展现出跨行业的实用价值。
衍生相关工作
基于MeshLex-Data-Source已催生了一系列具有影响力的创新工作。其上游项目MeshLex-Research本身便是针对该数据集的代表性研究探索,旨在构建统一的网格解析框架。由同一处理流程衍生的MeshLex-Patches数据集,进一步将全局网格分割为局部面片,为局部几何特征学习与细粒度生成任务提供了新的基准。此外,利用该数据集训练的网格扩散模型和拓扑感知生成网络,在ShapeNet等传统基准上取得了更优的形状覆盖与保真度。这些衍生工作不仅验证了数据集的高质量与兼容性,也引领了后续多源3D数据统一、几何自监督学习与结构化网格生成等方向的研究潮流。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作