five

ObjectNet 用于对象识别的大型真实世界的数据集|对象识别数据集|机器学习偏差控制数据集

收藏
帕依提提2024-03-04 收录
对象识别
机器学习偏差控制
下载链接:
https://www.payititi.com/opendatasets/show-1731.html
下载链接
链接失效反馈
资源简介:
Ready to help develop the next generation of object recognition algorithms that have robustness, bias, and safety in mind. Controls can remove bias from other datasets machine learning, not just vision. ObjectNet is a large real-world test set for object recognition with control where object backgrounds, rotations, and imaging viewpoints are random. Most scientific experiments have controls, confounds which are removed from the data, to ensure that subjects cannot perform a task by exploiting trivial correlations in the data. Historically, large machine learning and computer vision datasets have lacked such controls. This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications. When tested on ObjectNet, object detectors show a 40-45% drop in performance, with respect to their performance on other benchmarks, due to the controls for biases. Controls make ObjectNet robust to fine-tuning showing only small performance increases. We develop a highly automated platform that enables gathering datasets with controls by crowdsourcing image capturing and annotation. ObjectNet is the same size as the ImageNet test set (50,000 images), and by design does not come paired with a training set in order to encourage generalization. The dataset is both easier than ImageNet – objects are largely centered and unoccluded – and harder, due to the controls. Although we focus on object recognition here, data with controls can be gathered at scale using automated tools throughout machine learning to generate datasets that exercise models in new ways thus providing valuable feedback to researchers. This work opens up new avenues for research in generalizable, robust, and more human-like computer vision and in creating datasets where results are predictive of real-world performance. Andrei Barbu, David Mayo, Julian Alverio, William Luo, Christopher Wang, Dan Gutfreund, Josh Tenenbaum, and Boris Katz. Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. In Advances in Neural Information Processing Systems 32, pages 9448–9458. 2019.
提供机构:
帕依提提
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

LFW

人脸数据集;LFW数据集共有13233张人脸图像,每张图像均给出对应的人名,共有5749人,且绝大部分人仅有一张图片。每张图片的尺寸为250X250,绝大部分为彩色图像,但也存在少许黑白人脸图片。 URL: http://vis-www.cs.umass.edu/lfw/index.html#download

AI_Studio 收录

中国农村教育发展报告

该数据集包含了中国农村教育发展的相关数据,涵盖了教育资源分布、教育质量、学生表现等多个方面的信息。

www.moe.gov.cn 收录

TPTP

TPTP(Thousands of Problems for Theorem Provers)是一个包含大量逻辑问题的数据集,主要用于定理证明器的测试和评估。它包含了多种逻辑形式的问题,如一阶逻辑、高阶逻辑、命题逻辑等。

www.tptp.org 收录

Papersnake/people_daily_news

人民日报(1946-2023)数据集是CialloCorpus的一部分。

hugging_face 收录

The Rice Annotation Project Database (RAP-DB)

RAP-DB是一个专注于水稻基因组注释的数据库,提供了水稻基因组的详细注释信息,包括基因结构、功能注释、表达数据等。该数据库旨在为水稻研究者提供一个全面的资源,以促进水稻基因组学和遗传学的研究。

rapdb.dna.affrc.go.jp 收录