AI训练加速测试数据集
收藏国家基础学科公共科学数据中心2026-02-14 收录
下载链接:
https://nbsdc.cn/general/dataDetail?id=698a04b8195d2631dc80f024&type=1
下载链接
链接失效反馈官方服务:
资源简介:
本数据集基于ImageNet-1K(ILSVRC2012)标准训练集构建,包含约128万张高分辨率自然图像。针对分布式训练中数据加载与梯度同步的特性,本数据集采用了特定的分片存储结构,将图像数据重组为294个独立的数据分块,以优化在多机多卡环境下的I/O吞吐性能。本研究详细记录了该数据集在GPU集群及NCCL 通信环境下的基准测试流程,并结合梯度协商、梯度融合与压缩等硬件加速方法,提供了ResNet-50与VGG16模型在混合精度训练下的吞吐量基准数据。本数据集的发布为评估分布式训练框架的通信效率、优化算力资源配置以及研究云控基础平台硬件提供了标准化的基准测试对象。
This dataset is constructed based on the standard training split of ImageNet-1K (ILSVRC2012), containing approximately 1.28 million high-resolution natural images. Considering the characteristics of data loading and gradient synchronization in distributed training, this dataset adopts a specific sharded storage structure, reorganizing the image data into 294 independent data chunks to optimize I/O throughput performance in multi-node multi-GPU environments. This study thoroughly documents the benchmarking workflow of this dataset under GPU clusters and NCCL communication environments, and provides throughput benchmark data for ResNet-50 and VGG16 models during mixed-precision training, combined with hardware acceleration methods such as gradient negotiation, gradient fusion and gradient compression. The release of this dataset offers a standardized benchmarking target for evaluating the communication efficiency of distributed training frameworks, optimizing computing resource allocation, and researching hardware of cloud-controlled infrastructure platforms.
提供机构:
阿里云计算有限公司
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是基于ImageNet-1K构建的AI训练加速测试数据集,包含约128万张高分辨率自然图像,并针对分布式训练优化为294个数据分块以提升I/O性能。它提供了GPU集群下的基准测试数据,包括ResNet-50和VGG16模型的混合精度训练吞吐量,主要用于评估分布式训练框架的通信效率和硬件优化,服务于云控平台相关研究。
以上内容由遇见数据集搜集并总结生成



