five

PlanktonSet 1.0: Plankton imagery data collected from F.G. Walton Smith in Straits of Florida from 2014-06-03 to 2014-06-06 and used in the 2015 National Data Science Bowl (NCEI Accession 0127422)

收藏
DataCite Commons2025-04-07 更新2025-04-16 收录
下载链接:
https://www.ncei.noaa.gov/archive/accession/0127422
下载链接
链接失效反馈
官方服务:
资源简介:
Data presented here are subset of a larger plankton imagery data set collected in the subtropical Straits of Florida from 2014-05-28 to 2014-06-14. Imagery data were collected using the In Situ Ichthyoplankton Imaging System (ISIIS-2) as part of a NSF-funded project to assess the biophysical drivers affecting fine-scale interactions between larval fish, their prey, and predators. This subset of images was used in the inaugural National Data Science Bowl (www.datasciencebowl.com) hosted by Kaggle and sponsored by Booz Allen Hamilton. Data were originally collected to examine the biophysical drivers affecting fine-scale (spatial) interactions between larval fish, their prey, and predators in a subtropical pelagic marine ecosystem. Image segments extracted from the raw data were sorted into 121 plankton classes, split 50:50 into train and test data sets, and provided for a machine learning competition (the National Data Science Bowl). There was no hierarchical relationships explicit in the 121 plankton classes, though the class naming convention and a tree-like diagram (see file "Plankton Relationships.pdf") indicated relationships between classes, whether it was taxonomic or structural (size and shape). We intend for this dataset to be available to the machine learning and computer vision community as a standard machine learning benchmark. This “Plankton 1.0” dataset is a medium-size dataset with a fair amount of complexity where image classification improvements can still be made.

本数据集所收录的数据,为2014年5月28日至2014年6月14日在亚热带佛罗里达海峡采集的大型浮游生物影像数据集的子集。本次影像数据采用原位鱼卵仔鱼成像系统(In Situ Ichthyoplankton Imaging System, ISIIS-2)采集,作为美国国家科学基金会(National Science Foundation, NSF)资助项目的一部分,该项目旨在解析影响仔鱼、其猎物与捕食者之间精细尺度相互作用的生物物理驱动因子。该图像子集曾被用于由Kaggle主办、博思艾伦汉密尔顿(Booz Allen Hamilton)赞助的首届全国数据科学大赛(National Data Science Bowl,官网:www.datasciencebowl.com)。本数据的原始采集目标为解析亚热带远洋海洋生态系统中,影响仔鱼、其猎物与捕食者之间精细尺度(空间)相互作用的生物物理驱动因子。从原始数据中提取的图像片段被划分为121个浮游生物类别,并以50:50的比例划分为训练集与测试集,用于本次机器学习竞赛(即前述全国数据科学大赛)。尽管121个浮游生物类别的命名规则与一幅树形示意图(详见文件"Plankton Relationships.pdf")体现了类群间的关联——无论是分类学关联还是结构(尺寸与形态)关联,但这121个类别并未设置明确的层级关系。本团队期望将本数据集作为标准机器学习基准数据集,开放给机器学习与计算机视觉领域的研究者使用。这款“浮游生物1.0(Plankton 1.0)”数据集属于中等规模数据集,具备相当的复杂度,目前仍有提升图像分类效果的空间。
提供机构:
NOAA National Centers for Environmental Information
创建时间:
2015-05-08
搜集汇总
背景与挑战
背景概述
PlanktonSet 1.0是一个浮游生物图像数据集,采集自佛罗里达海峡的F.G. Walton Smith号船,时间范围为2014年6月3日至6日,用于2015年国家数据科学碗竞赛。数据集包含121个浮游生物类别的图像,按50:50划分为训练集和测试集,是一个中等规模、复杂度较高的图像分类基准数据集,旨在支持机器学习和计算机视觉研究。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务