XintongHe/Stomatal_Images_Datasets
收藏Hugging Face2024-03-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/XintongHe/Stomatal_Images_Datasets
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-classification
- image-segmentation
---
# Populus Stomatal Images Datasets
<!-- Provide a quick summary of the dataset. -->
This dataset is a detailed assembly of 11,000 annotated images for advanced analysis and machine learning applications in leaf stomatal research.
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
This dataset consists of around 11,000 unique images of hardwood leaf stomata collected from projects conducted between 2015 and 2022. Within the dataset, there are more than 7,000 images of 17 common hardwood species, such as oak, maple, ash, elm, and hickory. Additionally, the dataset contains over 3,000 images of 55 genotypes from seven Populus taxa. For each image, it is represented with image_id, species, scientific_name, image_path, image_magnification, width, height, and resolution and annotations. Within annotations, there are category id and information about the bounded box of the image.
- **Curated by:** [Jiaxin Wang, Heidi J. Renninger and Qin Ma]
- **Language(s) (NLP):** [English]
- **License:** [http://creativecommons.org/licenses/by/4.0/]
### Dataset Sources
<!-- Provide the basic links for the dataset. -->
- **Repository:** [https://zenodo.org/records/8271253]
- **Paper:** [https://www.nature.com/articles/s41597-023-02657-3]
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
(1) Employ state-of-the-art machine learning models to identify, count, and quantify leaf stomata; (2) Explore the diverse range of stomatal characteristics across different types of hardwood trees; (3) Develop new indices for measuring stomata.
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
```
{'image_id': 'STMHD0001',
'species': 'Nuttall oak',
'scientific_name': 'Quercus texana Buckley',
'image': <PIL.JpegImagePlugin.JpegImageFile image mode=RGB size=1024x768>,
'magnification': 100,
'width': 1024,
'height': 768,
'resolution': 118,
'annotations': {'category_id': [1,0,0,1,......,1,0,0],
'bounding_box': [{'x_center_rel': 0.25232601165771484,
'y_center_rel': 0.014441000297665596,
'width_rel': 0.022092999890446663,
'height_rel': 0.02790999971330166},
......,
{'x_center_rel': 0.9088180065155029,
'y_center_rel': 0.9940109848976135,
'width_rel': 0.06590700149536133,
'height_rel': 0.010591999627649784}]
}}
```
## Dataset Field
```
"image_id"[string]: Unique identifier for each image, corresponding to the file name without the file extension.
"species"[string]: The common name of the tree’s species the stomata in the image belong to.
"scientific_name"[string]: The scientific or Latin name of the tree’s species.
"image"[PIL]: A PIL.Image.Image object containing the image.
"magnification"[integer]: The magnification level at which the image was captured, represented as an integer.
"width"[integer]: The width of the image.
"height"[integer]: The height of the image
"resolution"[integer]: The resolution of the image
"annotation_coordinates"[dictionary]: A dictionary containing the category id, where inner_guard_cell_walls was labeled as “0”, whole_stomata (stomatal aperture and guard cells) was labeled as “1”. and bounding box coordinates for the annotated stomatal features, where the x_center and y_center are expressed as normalized coordinates that correspond to the center of the bounding box, while width and height are normalized values that represent the relative width and height of the box concerning the dimensions of the image
```
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
Machine learning (ML) algorithms have shown potential in automatically detecting and measuring stomata. However, ML algorithms require substantial data to efficiently train and optimize models, but their potential is restricted by the limited availability and quality of stomatal images. To overcome this obstacle, this dataset was established.
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
The study utilized stomatal images from two datasets: Hardwood and Populus spp., acquired from 2015 to 2022. The Hardwood dataset contained 16 species, including American elm (Ulmus americana Planch), cherrybark oak (Quercus pagoda Raf.), Nuttall oak (Quercus texana Buckley), shagbark hickory (Carya ovata (Mill.) K. Koch), Shumard oak (Quercus shumardii Buckley), swamp chestnut oak (Quercus michauxii Nutt.), water oak (Quercus nigra L.), willow oak (Quercus phellos L.), ash (Fraxinus L.), black gum (Nyssa sylvatica Marshall), deerberry (Vaccinium stamineum Linneaus), leatherwood (Dirca palustris L.), red maple (Acer rubrum L.), post oak (Quercus stellata Wangenh.), willow (Salix spp.), and winged elm (Ulmus alata Michx.), with the age of seedlings ranging from 1–3 years for Nuttall oak, water oak, and Shumard oak, and 30–50 years for the rest. Using a compound light microscope (Olympus, Tokyo, Japan) equipped with a digital microscope camera (MU300, AmScope, USA) with a 5 mm lens and a fixed microscope adapter (FMA050, AmScope), over 10,000 stomatal images were captured. The Populus dataset consisted of over 3,000 images from 55 genotypes of seven taxa of hybrid poplar and eastern cottonwood (Populus deltoides), which were 4 to 5 years old.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
This dataset includes only images of stomata from hardwood trees and Populus, limiting its applicability for studying stomata of other tree genera, though it may serve as reference data. This dataset is not divided into training and testing sets; users must divide it themselves when necessary.Despite following rigorous procedures in collecting leaves and micrographs, considering human and instrumental errors, there's a possibility of inaccuracies in the images and their associated information within the datasets. Even though the annotation process employed pre-trained model labeling methods, complemented by quick checks using LabelImg, potential model and computational errors could still lead to incorrect annotations.
## Citation
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
Wang, J., Renninger, H.J. & Ma, Q. Labeled temperate hardwood tree stomatal image datasets from seven taxa of Populus and 17 hardwood species. Sci Data 11, 1 (2024). https://doi.org/10.1038/s41597-023-02657-3
提供机构:
XintongHe
原始信息汇总
Populus Stomatal Images Datasets
数据集概述
该数据集包含约11,000张独特的硬木叶片气孔图像,这些图像是从2015年至2022年间进行的项目中收集的。数据集中包含超过7,000张来自17种常见硬木树种(如橡树、枫树、白蜡树、榆树和山核桃)的图像,以及超过3,000张来自七个杨树种的55个基因型的图像。每张图像包含以下信息:image_id、species、scientific_name、image_path、image_magnification、width、height、resolution 和 annotations。在 annotations 中,包含类别ID和图像边界框的信息。
数据集字段
image_id[字符串]: 每张图像的唯一标识符,对应文件名(不包括文件扩展名)。species[字符串]: 图像中气孔所属树种的常用名称。scientific_name[字符串]: 树种的科学或拉丁名称。image[PIL]: 包含图像的PIL.Image.Image对象。magnification[整数]: 图像捕获时的放大倍数,表示为整数。width[整数]: 图像的宽度。height[整数]: 图像的高度。resolution[整数]: 图像的分辨率。annotation_coordinates[字典]: 包含类别ID和标注气孔特征的边界框坐标,其中x_center和y_center表示边界框中心的归一化坐标,width和height表示相对于图像尺寸的归一化宽度和高度。
数据集用途
- 使用先进的机器学习模型来识别、计数和量化叶片气孔。
- 探索不同类型硬木树种间气孔特征的多样性。
- 开发新的气孔测量指数。
数据集来源
- Repository: [https://zenodo.org/records/8271253]
- Paper: [https://www.nature.com/articles/s41597-023-02657-3]
数据集局限性
该数据集仅包含硬木树和杨树的气孔图像,限制了其在其他树种气孔研究中的应用。数据集未划分为训练集和测试集,用户需自行划分。尽管在收集叶片和显微图像时遵循了严格程序,但仍可能存在图像和相关信息的不准确性。
引用
Wang, J., Renninger, H.J. & Ma, Q. Labeled temperate hardwood tree stomatal image datasets from seven taxa of Populus and 17 hardwood species. Sci Data 11, 1 (2024). https://doi.org/10.1038/s41597-023-02657-3



