Expression-based machine learning models for predicting plant tissue identity
收藏DataCite Commons2026-03-05 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.4b8gthtn7
下载链接
链接失效反馈官方服务:
资源简介:
The selection of Arabidopsis as a model organism played a pivotal role in
advancing genomic science. Competing frameworks to select an agricultural-
or ecological-based model species were selected against in favor of
building knowledge in a species that would facilitate genome-enabled
research. Here, we examine the ability of models based on Arabidopsis gene
expression data to predict tissue identity in other flowering plants.
Comparing different machine learning algorithms, models trained and tested
on Arabidopsis data achieved near-perfect precision and recall values,
whereas when tissue identity is predicted across the flowering plants
using models trained on Arabidopsis data, precision values range from 0.69
to 0.74 and recall from 0.54 to 0.64. Below-ground tissue is more
predictable than other tissue types, and the ability to predict tissue
identity is not correlated with phylogenetic distance from Arabidopsis.
K-Nearest Neighbors is the most successful algorithm and suggests that
gene expression signatures, rather than marker genes, are more valuable in
creating models for tissue and cell type prediction in plants. Our
data-driven results highlight that the assertion that knowledge from
Arabidopsis is translatable to other plants is not always true.
Considering the current landscape of abundant sequencing data, we should
reevaluate the scientific emphasis on Arabidopsis and prioritize plant
diversity.
提供机构:
Dryad
创建时间:
2024-09-25



