five

inanxr/ProsopoLarge

收藏
Hugging Face2025-12-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/inanxr/ProsopoLarge
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: image dtype: image - name: label dtype: class_label splits: - name: train num_bytes: 29320000000 num_examples: 5179510 download_size: 26157852001 dataset_size: 29320000000 license: cc-by-4.0 task_categories: - image-classification tags: - face-recognition - ms1mv3 - prosopo pretty_name: Prosopo Large Dataset (MS1MV3) size_categories: - 1M<n<10M --- # 🧪 Prosopo Large Dataset (MS1MV3) ![Prosopo Banner](https://raw.githubusercontent.com/InanXR/Prosopo/refs/heads/main/assets/Prosopo_Large.png) > **5.1 Million high-quality aligned face images for state-of-the-art model training.** This dataset serves as the **core training engine** for the **Prosopo** face recognition system. It contains the **MS1MV3** (MS-Celeb-1M Cleaned) dataset, pre-aligned and packed into the high-performance MXNet RecordIO format. ## 📊 Dataset Statistics | Metric | Value | | :--- | :--- | | **Identities** | 93,431 | | **Total Images** | 5,179,510 | | **Image Size** | 112 x 112 px | | **Alignment** | RetinaFace (5-point landmark) | | **Format** | MXNet RecordIO (Packed Binary) | | **Total Size** | ~28 GB (Unpacked) | ## 📁 Content Structure The dataset is provided as a **ZIP archive** containing the following RecordIO files: - `train.rec`: The Data (27.3 GB) - All images packed - `train.idx`: The Index (97 MB) - Offsets for random access - `train.lst`: The Metadata (411 MB) - Path/Label/Index map It also includes standard validation benchmarks: - `lfw.bin`, `agedb_30.bin`, `cfp_fp.bin` ## 🚀 Usage You can download the zip file and extract it to your training environment. ```python from huggingface_hub import hf_hub_download import zipfile # Download zip_path = hf_hub_download(repo_id="inanxr/prosopo-large-dataset", filename="prosopo-large-dataset.zip", repo_type="dataset") # Extract with zipfile.ZipFile(zip_path, 'r') as zip_ref: zip_ref.extractall("./ms1m_dataset") ``` ## 📜 Acknowledgements Original Data Source: **MS-Celeb-1M** (Cleaned by InsightFace/DeepGlint) > *InanXR/Prosopo re-hosting for reproducibility.* > *Guo, Yandong, et al. "Ms-celeb-1m: A dataset and benchmark for large-scale face recognition." ECCV 2016.*
提供机构:
inanxr
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作