Wilds
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/Wilds
下载链接
链接失效反馈官方服务:
资源简介:
WILDS 是一组精选的基准数据集,代表了野外面临的分布变化。在每个数据集中,每个数据点都来自一个域,该域表示在某些方面相似的数据上的分布,例如,具有相同支架结构的分子,或来自同一区域的卫星图像。我们研究了两种类型的跨域分布变化。在域泛化中,训练和测试分布包括不相交的域集,目标是泛化到训练期间看不见的域,例如具有新支架结构的分子。在亚群转移中,训练和测试域重叠,但它们的相对比例不同。我们通常通过模型在测试域上的最差性能来评估模型,每个测试域对应于一个感兴趣的子群,例如不同的地理区域。
WILDS 数据集涵盖了多种模式和应用,并反映了由不同人口统计、用户、医院、相机位置、国家、时间段和分子支架引起的广泛分布变化。
WILDS is a curated collection of benchmark datasets that represent distribution shifts encountered in real-world wild scenarios. In each dataset, every data point originates from a domain, which denotes a distribution over data that are similar in certain aspects, such as molecules with the same scaffold structure or satellite images from the same geographic region. We investigate two types of cross-domain distribution shifts. In domain generalization, the training and test distributions consist of disjoint sets of domains, with the goal of generalizing to domains unseen during training, such as molecules with novel scaffold structures. For subpopulation shift, the training and test domains overlap, but their relative proportions differ. We typically evaluate models using their worst-case performance across test domains, where each test domain corresponds to a subgroup of interest, such as distinct geographic regions.
The WILDS datasets cover a diverse range of modalities and application scenarios, and reflect a wide spectrum of distribution shifts caused by varying demographics, users, hospitals, camera locations, countries, time periods, and molecular scaffolds.
提供机构:
OpenDataLab
创建时间:
2022-08-19
搜集汇总
数据集介绍

背景与挑战
背景概述
Wilds是一个包含多种模式和应用场景的基准数据集,专注于研究域泛化和亚群转移两种分布变化。该数据集旨在评估模型在未见过的域或不同子群上的性能,反映了由不同因素引起的广泛分布变化。
以上内容由遇见数据集搜集并总结生成



