five

UniParser/UniEM-3M

收藏
Hugging Face2025-09-06 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/UniParser/UniEM-3M
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-to-text - text-to-image - image-segmentation tags: - electron_micrograph - Materials - microstructure - characterization - scientific_figure_understanding configs: - config_name: default data_files: - split: synthesized_data_structured_descriptions path: data/synthesized_data_structured_descriptions-* - split: synthesized_data_image_captions path: data/synthesized_data_image_captions-* dataset_info: features: - name: image dtype: image - name: attribute_description struct: - name: color_profile dtype: string - name: distribution dtype: string - name: layering dtype: string - name: microscopy_type dtype: string - name: morphology dtype: string - name: particle_density dtype: string - name: pixel_size_profile dtype: string - name: subject dtype: string - name: surface_texture dtype: string - name: full_caption dtype: string splits: - name: synthesized_data_structured_descriptions num_bytes: 13685770890.264 num_examples: 9106 - name: synthesized_data_image_captions num_bytes: 32591407248 num_examples: 19016 download_size: 46275971944 dataset_size: 46277178138.264 size_categories: - 10K<n<100K --- # UniEM-3M ## 📘 Dataset Summary UniEM-3M is the first large-scale multimodal electron microscopy (EM) dataset for instance-level microstructural understanding, which is proposed in our paper "[UniEM-3M: A Universal Electron Micrograph Dataset for Microstructural Segmentation and Generation](https://arxiv.org/abs/2508.16239)". It provides high-resolution electron micrographs with expert-curated annotations and textual descriptions, aiming to accelerate research in automated materials analysis and deep learning for materials science. The dataset addresses the scarcity of large-scale EM datasets by offering: - **5,091** high-resolution EM images - About **3 million instance segmentation labels** - **Image-level structural descriptions** disentangled by attributes - A **text-to-image diffusion model** trained on the full dataset --- ## 🚨 Important Notice At this stage, we are only releasing **the generative model—[UniEM-Gen](https://huggingface.co/NNNan/UniEM-Gen), the generated data and their corresponding textual descriptions**. The **real electron micrographs** and **instance segmentation annotations** will be released after our paper has completed peer review (currently under review). --- ## 🌐 Online Application We trained a **state-of-the-art instance segmentation model** for microstructural characterization on UniEM-3M, and further developed a **complete analysis software suite** based on this model. It is available as an online application here: 👉 [online application](https://www.bohrium.com/apps/uni-aims?tab=readme_link) --- ## 📂 Dataset Structure - **Currently released**: - **synthesized_data_structured_descriptions**: synthesized data with structured descriptions - **synthesized_data_image_captions**: synthesized data with natural language descriptions - **To be released** (after peer review): - Real EM images and corresponding descriptions - ~3M instance segmentation labels --- ## 🚀 Applications - Multimodal learning in materials science - Text-to-image generation with scientific fidelity - Instance segmentation of microstructures - Image captioning / attribute-aware description generation - Training and benchmarking deep learning models for EM data --- ## 📖 Citation If you use this dataset, please cite: ```bibtex @misc{wang2025uniem3muniversalelectronmicrograph, title={UniEM-3M: A Universal Electron Micrograph Dataset for Microstructural Segmentation and Generation}, author={Nan wang and Zhiyi Xia and Yiming Li and Shi Tang and Zuxin Fan and Xi Fang and Haoyi Tao and Xiaochen Cai and Guolin Ke and Linfeng Zhang and Yanhui Hong}, year={2025}, eprint={2508.16239}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2508.16239}, }
提供机构:
UniParser
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作