25个新引入的实际评估数据集

Name: 25个新引入的实际评估数据集
Creator: 澳大利亚国立大学
Published: 2021-12-02 02:15:58
License: 暂无描述

arXiv2021-12-02 更新2024-06-21 收录

下载链接：

https://github.com/sxzrt/SemiStructured-Dataset-Representations

下载链接

链接失效反馈

官方服务：

资源简介：

本研究中，作者从网站上收集了25个实际评估数据集，这些数据集用于更全面地评估AutoEval任务。这些数据集包括CIFAR-Flickr和Digital-Shutterstock等，涵盖了从2001年到2020年的图像数据，用于评估模型在不同环境和条件下的性能。数据集的创建过程涉及从特定网站搜索关键词对应的图像，并手动处理以符合数据集要求。这些数据集主要应用于模型性能评估，特别是在无标签测试集上预测模型准确性的场景，旨在解决模型在未知环境中的行为理解和性能预测问题。

In this study, the authors collected 25 real-world evaluation datasets from websites to enable more comprehensive assessment of the AutoEval task. These datasets include CIFAR-Flickr, Digital-Shutterstock, and others, covering image data spanning from 2001 to 2020, and are designed to evaluate model performance across various environments and conditions. The creation of these datasets involves searching for keyword-matched images on specific websites and manually curating them to comply with dataset requirements. Primarily applied for model performance evaluation, particularly in scenarios involving predicting model accuracy on unlabeled test sets, these datasets aim to address the challenges of understanding model behavior and forecasting model performance in unknown environments.

提供机构：

澳大利亚国立大学

创建时间：

2021-12-02