Machine learning suggests that small size helps broaden plasmid host range
收藏DataONE2023-11-03 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:61605995944f7fae9a244e8ce3a5970b5b7a305364b722828e4b79cae0a5ba81
下载链接
链接失效反馈官方服务:
资源简介:
Plasmids mediate gene exchange across taxonomic barriers through conjugation, shaping bacterial evolution for billions of years. While plasmid mobility can be harnessed for genetic engineering and drug-delivery applications, rapid plasmid-mediated spread of resistance genes has rendered most clinical antibiotics useless. To solve this urgent and growing problem, we must understand how plasmids spread across bacterial communities. Here, we applied machine-learning models to identify features that are important for extending plasmid host range. We assembled an up-to-date dataset of more than thirty thousand bacterial plasmids, separated them into 1125 clusters, and assigned each cluster a distribution possibility score, taking into account host distribution of each taxonomic rank and the sampling bias of the existing sequencing data. Using this score and an optimized plasmid feature pool, we built a model stack consisting of DecisionTreeRegressor, EvoTreeRegressor, and LGBMRegressor as ba..., , , # Machine learning suggests small size is a key determinant of plasmid host range
[https://doi.org/10.5061/dryad.1g1jwsv31](https://doi.org/10.5061/dryad.1g1jwsv31)
## Description of the data and file structure
There are four files in this dataset:
(1) fastANI_edgeweights: the edgeweights used for plasmids clustering analysis by using Leidenalg ([leidenalg documentation â leidenalg 0.10.2.dev15+g56e7241.d20231013 documentation](https://leidenalg.readthedocs.io/en/latest/index.html)) with CPMVertexPartition. The original edgeweights are obtained by running FastANI ([ParBLiSS/FastANI: Fast Whole-Genome Similarity (ANI) Estimation (github.com)](https://github.com/ParBLiSS/FastANI)), and then transformed to final edgeweights as follows: ANI/100, if ANI >= 95; 1/(1 + 20*(1 - ANI/100)), otherwise. Headers: field 1 are the query sequences; field 2 are the target sequences; field 3 are the final edgeweights of these two sequences.
(2) plasmid_net_connected.cys: the original calculated pl...
创建时间:
2023-11-29



