five

Expression-based machine learning models for predicting plant tissue identity

收藏
DataCite Commons2026-03-05 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.4b8gthtn7
下载链接
链接失效反馈
官方服务:
资源简介:
The selection of Arabidopsis as a model organism played a pivotal role in advancing genomic science. Competing frameworks to select an agricultural- or ecological-based model species were selected against in favor of building knowledge in a species that would facilitate genome-enabled research. Here, we examine the ability of models based on Arabidopsis gene expression data to predict tissue identity in other flowering plants. Comparing different machine learning algorithms, models trained and tested on Arabidopsis data achieved near-perfect precision and recall values, whereas when tissue identity is predicted across the flowering plants using models trained on Arabidopsis data, precision values range from 0.69 to 0.74 and recall from 0.54 to 0.64. Below-ground tissue is more predictable than other tissue types, and the ability to predict tissue identity is not correlated with phylogenetic distance from Arabidopsis. K-Nearest Neighbors is the most successful algorithm and suggests that gene expression signatures, rather than marker genes, are more valuable in creating models for tissue and cell type prediction in plants. Our data-driven results highlight that the assertion that knowledge from Arabidopsis is translatable to other plants is not always true. Considering the current landscape of abundant sequencing data, we should reevaluate the scientific emphasis on Arabidopsis and prioritize plant diversity.
提供机构:
Dryad
创建时间:
2024-09-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作