five

Selfie with ID Dataset - 69,435 images for Re-identification and Facial Recognition

收藏
Databricks2026-04-13 收录
下载链接:
https://marketplace.databricks.com/details/37ab6a31-7743-4670-bfae-a69809789e98/Unidata_Selfie-with-ID-Dataset---69,435-images-for-Re-identification-and-Facial-Recognition
下载链接
链接失效反馈
官方服务:
资源简介:
Overview The Selfie with ID Dataset is a commercial biometric image dataset produced by Unidata, designed for training and evaluating facial recognition, re-identification, and identity verification systems. It contains 69,435 images of 4,629 individuals from 85 countries, making it one of the most geographically diverse selfie image datasets available for KYC and document verification tasks. Each person in the dataset is represented by a set of 15 files: 13 selfie photos taken against diverse backgrounds and under varying lighting conditions, plus 2 photos of identity documents from different document types. This pairing structure makes the dataset particularly valuable for building and benchmarking recognition models that match facial images from real-world selfies against official ID photos. Dataset Structure Each individual set includes: - 13 selfie images — captured under different lighting, backgrounds, and camera angles - 2 ID document photos — taken from different document types per person Supported document types include: - Passports and international passports - Driver's licenses - Student cards - Health certificates - Bank, transport, and membership cards - Other official certificates Image formats: JPG, JPEG, HEIC Resolution: 1000×750 and higher Subject Demographics Both male and female participants are represented across a wide age range and 85 countries. Annotations per subject include technical metadata: age, gender, and country of origin. Age distribution: 1. Under 18: 451 people 2. 19–25: 1,411 people (largest group) 3. 26–32: 1,301 people 4. 33–39: 889 people 5. 40–46: 377 people 6. 47–53: 128 people 7. 54–59: 45 people 8. 60–66: 23 people 9. 67+: 4 people The majority of subjects fall within the 19–32 age range, reflecting the typical user base for digital onboarding and KYC applications. Continent Distribution: - Europe: 60,5% - Asia: 15,3% - South America: 12,6% - Africa: 9,1% - North America: 2,5% Technical Specifications Data was collected via crowdsourcing platforms using a wide range of consumer smartphones to ensure realistic capture conditions: - Samsung Galaxy series (M31, A05, and others) - Xiaomi Redmi series (Note 10S, Redmi 14C, and others) - iPhone models (X, 11, 15 Pro Max) - Infinix Note 11, Infinix Smart 8 - Tecno Pop 7 - And other devices Use Cases - Financial Services — KYC and Remote Onboarding. The dataset supports banks and fintech platforms in training identity verification systems for digital onboarding. Paired selfie photos and ID documents enable recognition systems to confirm customer identities, reduce fraud, and ensure compliance with KYC requirements. - Telecommunications — Subscriber Verification. Telecom companies use selfie and ID card image datasets to validate identity during SIM registration and subscription management, matching facial images from selfies with official ID photos to prevent fraudulent accounts. - E-Government — Digital Identity Authentication. Public agencies can use the dataset to train re-identification systems for citizen authentication in e-government platforms, reducing impersonation risks and protecting access to sensitive services. Compliance & Security The dataset is collected from legally permissible sources, including crowdsourcing platforms and in-house data capture teams. All data complies with GDPR and applicable privacy regulations. Storage is hosted on AWS infrastructure certified to ISO 27001 and ISO 27701 standards. Summary The Selfie with ID Dataset is a large-scale, high-diversity image collection purpose-built for identity verification and facial recognition research. With 69,435 images from 4,629 people across 85 countries, 13 selfie photos and 2 ID document photos per subject, rich demographic metadata, and broad device coverage, it provides the real-world variability needed to build robust re-identification algorithms and reliable verification systems for financial services, telecom, e-government, and biometric technology applications.
提供机构:
Unidata
二维码
社区交流群
二维码
科研交流群
商业服务