inanxr/ProsopoLarge
收藏Hugging Face2025-12-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/inanxr/ProsopoLarge
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: image
dtype: image
- name: label
dtype: class_label
splits:
- name: train
num_bytes: 29320000000
num_examples: 5179510
download_size: 26157852001
dataset_size: 29320000000
license: cc-by-4.0
task_categories:
- image-classification
tags:
- face-recognition
- ms1mv3
- prosopo
pretty_name: Prosopo Large Dataset (MS1MV3)
size_categories:
- 1M<n<10M
---
# 🧪 Prosopo Large Dataset (MS1MV3)

> **5.1 Million high-quality aligned face images for state-of-the-art model training.**
This dataset serves as the **core training engine** for the **Prosopo** face recognition system. It contains the **MS1MV3** (MS-Celeb-1M Cleaned) dataset, pre-aligned and packed into the high-performance MXNet RecordIO format.
## 📊 Dataset Statistics
| Metric | Value |
| :--- | :--- |
| **Identities** | 93,431 |
| **Total Images** | 5,179,510 |
| **Image Size** | 112 x 112 px |
| **Alignment** | RetinaFace (5-point landmark) |
| **Format** | MXNet RecordIO (Packed Binary) |
| **Total Size** | ~28 GB (Unpacked) |
## 📁 Content Structure
The dataset is provided as a **ZIP archive** containing the following RecordIO files:
- `train.rec`: The Data (27.3 GB) - All images packed
- `train.idx`: The Index (97 MB) - Offsets for random access
- `train.lst`: The Metadata (411 MB) - Path/Label/Index map
It also includes standard validation benchmarks:
- `lfw.bin`, `agedb_30.bin`, `cfp_fp.bin`
## 🚀 Usage
You can download the zip file and extract it to your training environment.
```python
from huggingface_hub import hf_hub_download
import zipfile
# Download
zip_path = hf_hub_download(repo_id="inanxr/prosopo-large-dataset", filename="prosopo-large-dataset.zip", repo_type="dataset")
# Extract
with zipfile.ZipFile(zip_path, 'r') as zip_ref:
zip_ref.extractall("./ms1m_dataset")
```
## 📜 Acknowledgements
Original Data Source: **MS-Celeb-1M** (Cleaned by InsightFace/DeepGlint)
> *InanXR/Prosopo re-hosting for reproducibility.*
> *Guo, Yandong, et al. "Ms-celeb-1m: A dataset and benchmark for large-scale face recognition." ECCV 2016.*
提供机构:
inanxr



