IMDb-WIKI
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-151.html
下载链接
链接失效反馈官方服务:
资源简介:
we took the list of the most popular 100,000 actors as listed on the IMDb website and (automatically) crawled from their profiles date of birth, name, gender and all images related to that person. Additionally we crawled all profile images from pages of people from Wikipedia with the same meta information. We removed the images without timestamp (the date when the photo was taken). Assuming that the images with single faces are likely to show the actor and that the timestamp and date of birth are correct, we were able to assign to each such image the biological (real) age. Of course, we can not vouch for the accuracy of the assigned age information. Besides wrong timestamps, many images are stills from movies - movies that can have extended production times. In total we obtained 460,723 face images from 20,284 celebrities from IMDb and 62,328 from Wikipedia, thus 523,051 in total.
我们获取了IMDb(Internet Movie Database)网站上收录的最受欢迎的10万名演员名单,并通过自动化方式从其个人主页中抓取了出生日期、姓名、性别以及与该人物相关的全部图像。此外,我们还从维基百科(Wikipedia)的人物主页中爬取了所有个人资料图像,并同步采集了相同的元信息。我们剔除了未标注拍摄时间戳(即照片拍摄日期)的图像。假设仅包含单张人脸的图像大概率对应该人物本人,且拍摄时间戳与出生日期信息准确无误,我们便可为每张符合条件的图像标注其生物(实际)年龄。当然,我们无法保证所标注年龄信息的准确性。除时间戳存在误差外,大量图像仍来自电影剧照——而电影的制作周期往往较长。最终,我们从IMDb的20,284位名人中获取了460,723张人脸图像,从维基百科中获取了62,328张,总计523,051张人脸图像。
提供机构:
帕依提提
搜集汇总
数据集介绍

背景与挑战
背景概述
IMDb-WIKI数据集是一个包含超过52万张名人面部图像的大规模数据集,附带出生日期、性别等元数据,适用于年龄估计和人脸识别研究。数据集通过自动爬取IMDb和Wikipedia的名人资料构建,但年龄标签可能存在误差。
以上内容由遇见数据集搜集并总结生成



