five

Fashion-MNIST|图像识别数据集|机器学习数据集

收藏
OpenDataLab2025-04-05 更新2024-05-09 收录
图像识别
机器学习
下载链接:
https://opendatalab.org.cn/OpenDataLab/Fashion-MNIST
下载链接
链接失效反馈
资源简介:
Fashion-MNIST 是 Zalando 文章图像的数据集——由 60,000 个示例的训练集和 10,000 个示例的测试集组成。每个示例都是一个 28x28 灰度图像,与来自 10 个类别的标签相关联。我们打算将 Fashion-MNIST 作为原始 MNIST 数据集的直接替代品,用于对机器学习算法进行基准测试。它与训练和测试分割共享相同的图像大小和结构。
提供机构:
OpenDataLab
创建时间:
2022-03-17
AI搜集汇总
数据集介绍
main_image_url
构建方式
Fashion-MNIST数据集的构建基于经典的MNIST数据集,但专注于时尚商品的图像识别。该数据集由Zalando Research团队精心设计,包含60,000张训练图像和10,000张测试图像,每张图像均为28x28像素的灰度图像。图像类别涵盖了从T恤、裤子到鞋子等多种时尚物品,共计10个类别。通过将这些图像与相应的标签配对,构建了一个标准化的分类任务数据集,旨在替代传统的MNIST数据集,以更贴近实际应用场景。
使用方法
Fashion-MNIST数据集适用于多种机器学习和深度学习任务,特别是图像分类和特征提取。研究者和开发者可以利用该数据集进行模型的训练和验证,以评估其在时尚物品识别任务中的表现。常见的使用方法包括将数据集划分为训练集和测试集,采用卷积神经网络(CNN)等深度学习模型进行训练,并通过交叉验证等方法优化模型参数。此外,该数据集也可用于探索不同图像处理技术的效果,如数据增强和降维技术。
背景与挑战
背景概述
Fashion-MNIST数据集于2017年由Zalando Research团队创建,旨在替代传统的MNIST数据集,成为图像分类领域的新基准。该数据集包含了70,000张28x28像素的灰度图像,涵盖10种不同的服装类别,如T恤、裤子、外套等。Fashion-MNIST不仅继承了MNIST的简洁性,还引入了更高的分类难度,使其成为评估机器学习模型性能的理想选择。该数据集的推出,极大地推动了计算机视觉领域的发展,尤其是在深度学习模型的训练与评估方面,提供了更为复杂和实际的图像数据。
当前挑战
尽管Fashion-MNIST在图像分类领域取得了显著进展,但其构建与应用过程中仍面临诸多挑战。首先,数据集的图像分辨率较低,可能限制了模型对细节特征的捕捉能力。其次,由于服装类别的多样性和相似性,模型在区分某些类别时可能遇到困难,如区分衬衫与T恤。此外,数据集的平衡性虽已考虑,但在实际应用中仍需进一步优化,以应对现实世界中数据分布的不均匀性。最后,随着深度学习技术的不断进步,Fashion-MNIST的分类难度可能逐渐降低,未来可能需要引入更为复杂的数据集以保持挑战性。
发展历史
创建时间与更新
Fashion-MNIST数据集于2017年由Zalando Research团队创建,旨在替代传统的MNIST数据集,成为图像分类任务的新基准。该数据集自创建以来,未有官方更新记录,但其影响力持续扩大。
重要里程碑
Fashion-MNIST的发布标志着图像分类领域的一个重大转折点。它不仅提供了更具挑战性的图像数据,还促进了深度学习模型在实际应用中的性能提升。其首次公开发布在GitHub上,迅速吸引了全球研究者的关注,成为许多机器学习课程和研究项目的首选数据集。此外,Fashion-MNIST还推动了图像数据集多样性的讨论,促使更多领域专家关注数据集的质量和代表性。
当前发展情况
当前,Fashion-MNIST已成为计算机视觉领域的基础数据集之一,广泛应用于图像分类、特征提取和模型评估等任务。其简洁的结构和丰富的类别使其成为初学者和高级研究者的理想选择。随着深度学习技术的不断进步,Fashion-MNIST也在不断被重新审视和优化,以适应更复杂的模型和任务需求。此外,该数据集的成功还激发了更多类似数据集的创建,推动了整个领域的发展。
发展历程
  • Fashion-MNIST数据集首次发布,由Zalando Research团队创建,旨在替代传统的MNIST数据集,专注于时尚物品的图像识别。
    2017年
  • Fashion-MNIST被广泛应用于机器学习和深度学习领域,成为评估模型性能的标准基准之一。
    2018年
  • 研究者开始探索Fashion-MNIST在迁移学习和数据增强技术中的应用,进一步提升了其在实际问题中的适用性。
    2019年
  • Fashion-MNIST数据集在多个国际会议和期刊上被引用,成为图像分类研究的重要参考数据集。
    2020年
  • 随着深度学习技术的进步,Fashion-MNIST数据集的应用范围扩展到计算机视觉的其他领域,如目标检测和图像生成。
    2021年
常用场景
经典使用场景
在计算机视觉领域,Fashion-MNIST数据集被广泛用于图像分类任务的基准测试。该数据集由70,000张28x28像素的灰度图像组成,涵盖10种不同的服装类别,如T恤、裤子、外套等。研究人员常利用此数据集评估和比较不同机器学习算法在图像识别任务中的性能,尤其是在深度学习模型如卷积神经网络(CNN)的训练和验证过程中。
解决学术问题
Fashion-MNIST数据集解决了传统MNIST数据集在图像识别领域中过于简单的问题,为学术界提供了一个更具挑战性的基准。通过引入更复杂的图像和类别,该数据集帮助研究人员评估和改进算法的鲁棒性和泛化能力,推动了计算机视觉技术的发展。其广泛应用促进了图像分类算法在实际场景中的有效性和可靠性研究。
实际应用
在实际应用中,Fashion-MNIST数据集被用于开发和测试零售行业的图像识别系统。例如,服装零售商可以利用基于此数据集训练的模型,自动识别和分类库存中的服装,从而提高库存管理的效率和准确性。此外,该数据集还支持个性化推荐系统的开发,通过分析用户的购买历史和偏好,提供更精准的商品推荐。
数据集最近研究
最新研究方向
在计算机视觉领域,Fashion-MNIST数据集因其丰富的图像数据和广泛的应用场景,成为近年来研究的热点。最新的研究方向主要集中在利用深度学习技术提升图像分类的准确性和效率。研究者们通过引入更复杂的卷积神经网络架构,如ResNet和DenseNet,以及采用迁移学习和数据增强技术,显著提高了模型在Fashion-MNIST上的表现。此外,跨领域研究也在探索如何将Fashion-MNIST应用于其他领域,如医疗影像分析和自动驾驶,以验证和提升这些领域中的图像识别能力。这些研究不仅推动了计算机视觉技术的发展,也为实际应用提供了新的可能性。
相关研究论文
  • 1
    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning AlgorithmsMassachusetts Institute of Technology · 2017年
  • 2
    Deep Learning with Fashion-MNIST: A Comprehensive StudyUniversity of California, Berkeley · 2019年
  • 3
    Fashion-MNIST: A Novel Dataset for Benchmarking Machine Learning AlgorithmsStanford University · 2018年
  • 4
    Exploring the Effectiveness of Fashion-MNIST as a Drop-in Replacement for MNISTUniversity of Oxford · 2020年
  • 5
    Fashion-MNIST: A New Benchmark Dataset for Machine LearningCarnegie Mellon University · 2019年
以上内容由AI搜集并总结生成
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4098个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

ChemBL

ChemBL是一个化学信息学数据库,包含大量生物活性数据,涵盖了药物发现和开发过程中的各种化学实体。数据集包括化合物的结构信息、生物活性数据、靶点信息等。

www.ebi.ac.uk 收录

Canadian Census

**Overview** The data package provides demographics for Canadian population groups according to multiple location categories: Forward Sortation Areas (FSAs), Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs), Federal Electoral Districts (FEDs), Health Regions (HRs) and provinces. **Description** The data are available through the Canadian Census and the National Household Survey (NHS), separated or combined. The main demographic indicators provided for the population groups, stratified not only by location but also for the majority by demographical and socioeconomic characteristics, are population number, females and males, usual residents and private dwellings. The primary use of the data at the Health Region level is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information to monitor, plan, implement and evaluate programs to improve the health of Canadians and the efficiency of health services. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the health region data to raise awareness about health, an issue of concern to all Canadians. The Census population counts for a particular geographic area representing the number of Canadians whose usual place of residence is in that area, regardless of where they happened to be on Census Day. Also included are any Canadians who were staying in that area on Census Day and who had no usual place of residence elsewhere in Canada, as well as those considered to be 'non-permanent residents'. National Household Survey (NHS) provides demographic data for various levels of geography, including provinces and territories, census metropolitan areas/census agglomerations, census divisions, census subdivisions, census tracts, federal electoral districts and health regions. In order to provide a comprehensive overview of an area, this product presents data from both the NHS and the Census. NHS data topics include immigration and ethnocultural diversity; aboriginal peoples; education and labor; mobility and migration; language of work; income and housing. 2011 Census data topics include population and dwelling counts; age and sex; families, households and marital status; structural type of dwelling and collectives; and language. The data are collected for private dwellings occupied by usual residents. A private dwelling is a dwelling in which a person or a group of persons permanently reside. Information for the National Household Survey does not include information for collective dwellings. Collective dwellings are dwellings used for commercial, institutional or communal purposes, such as a hotel, a hospital or a work camp. **Benefits** - Useful for canada public health stakeholders, for public health specialist or specialized public and other interested parties. for health surveillance and population health research. for monitoring, planning, implementation and evaluation of health-related programs. media agencies may use the health regions data to raise awareness about health, an issue of concern to all canadians. giving the addition of longitude and latitude in some of the datasets the data can be useful to transpose the values into geographical representations. the fields descriptions along with the dataset description are useful for the user to quickly understand the data and the dataset. **License Information** The use of John Snow Labs datasets is free for personal and research purposes. For commercial use please subscribe to the [Data Library](https://www.johnsnowlabs.com/marketplace/) on John Snow Labs website. The subscription will allow you to use all John Snow Labs datasets and data packages for commercial purposes. **Included Datasets** - [Canadian Population and Dwelling by FSA 2011](https://www.johnsnowlabs.com/marketplace/canadian-population-and-dwelling-by-fsa-2011) - This Canadian Census dataset covers data on population, total private dwellings and private dwellings occupied by usual residents by forward sortation area (FSA). It is enriched with the percentage of the population or dwellings versus the total amount as well as the geographical area, province, and latitude and longitude. The whole Canada's population is marked as 100, referring to 100% for the percentages. - [Detailed Canadian Population Statistics by CMAs and CAs 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-cmas-and-cas-2011) - This dataset covers the population statistics of Canada by Census Metropolitan Areas (CMAs) and Census Agglomerations (CAs). It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by FED 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-fed-2011) - This dataset covers the population statistics of Canada from 2011 by Federal Electoral District of 2013 Representation Order. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Health Region 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-health-region-2011) - This dataset covers the population statistics of Canada by health region. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. - [Detailed Canadian Population Statistics by Province 2011](https://www.johnsnowlabs.com/marketplace/detailed-canadian-population-statistics-by-province-2011) - This dataset covers the population statistics of Canada by provinces and territories. It is categorized also by citizen/immigration status, ethnic origin, religion, mobility, education, language, work, housing, income etc. There is detailed characteristics categorization within these stated categories that are in 5 layers. **Data Engineering Overview** **We deliver high-quality data** - Each dataset goes through 3 levels of quality review - 2 Manual reviews are done by domain experts - Then, an automated set of 60+ validations enforces every datum matches metadata & defined constraints - Data is normalized into one unified type system - All dates, unites, codes, currencies look the same - All null values are normalized to the same value - All dataset and field names are SQL and Hive compliant - Data and Metadata - Data is available in both CSV and Apache Parquet format, optimized for high read performance on distributed Hadoop, Spark & MPP clusters - Metadata is provided in the open Frictionless Data standard, and its every field is normalized & validated - Data Updates - Data updates support replace-on-update: outdated foreign keys are deprecated, not deleted **Our data is curated and enriched by domain experts** Each dataset is manually curated by our team of doctors, pharmacists, public health & medical billing experts: - Field names, descriptions, and normalized values are chosen by people who actually understand their meaning - Healthcare & life science experts add categories, search keywords, descriptions and more to each dataset - Both manual and automated data enrichment supported for clinical codes, providers, drugs, and geo-locations - The data is always kept up to date – even when the source requires manual effort to get updates - Support for data subscribers is provided directly by the domain experts who curated the data sets - Every data source’s license is manually verified to allow for royalty-free commercial use and redistribution. **Need Help?** If you have questions about our products, contact us at [info@johnsnowlabs.com](mailto:info@johnsnowlabs.com).

Databricks 收录

MultiTalk

MultiTalk数据集是由韩国科学技术院创建,包含超过420小时的2D视频,涵盖20种不同语言,旨在解决多语言环境下3D说话头生成的问题。该数据集通过自动化管道从YouTube收集,每段视频都配有语言标签和伪转录,部分视频还包含伪3D网格顶点。数据集的创建过程包括视频收集、主动说话者验证和正面人脸验证,确保数据质量。MultiTalk数据集的应用领域主要集中在提升多语言3D说话头生成的准确性和表现力,通过引入语言特定风格嵌入,使模型能够捕捉每种语言独特的嘴部运动。

arXiv 收录

中国交通事故深度调查(CIDAS)数据集

交通事故深度调查数据通过采用科学系统方法现场调查中国道路上实际发生交通事故相关的道路环境、道路交通行为、车辆损坏、人员损伤信息,以探究碰撞事故中车损和人伤机理。目前已积累深度调查事故10000余例,单个案例信息包含人、车 、路和环境多维信息组成的3000多个字段。该数据集可作为深入分析中国道路交通事故工况特征,探索事故预防和损伤防护措施的关键数据源,为制定汽车安全法规和标准、完善汽车测评试验规程、

北方大数据交易中心 收录

Beijing Traffic

The Beijing Traffic Dataset collects traffic speeds at 5-minute granularity for 3126 roadway segments in Beijing between 2022/05/12 and 2022/07/25.

Papers with Code 收录