five

neuralbioinfo/PhaStyle-BACPHLIP

收藏
Hugging Face2025-01-09 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/neuralbioinfo/PhaStyle-BACPHLIP
下载链接
链接失效反馈
官方服务:
资源简介:
PhageStyle-BACPHLIP数据集旨在预测噬菌体的生活方式,将其分类为**烈性**或**温和**。该数据集分为训练集和验证集,训练集包含多种物种的噬菌体(不包括大肠杆菌),验证集仅包含**大肠杆菌**噬菌体。通过将序列分割为512bp和1022bp的片段,模拟现实世界中的基因组片段化情况,以训练机器学习模型进行噬菌体生活方式的分类。该数据集特别重要,因为它能够训练模型在不同物种和环境之间进行泛化,提供稳健的噬菌体生活方式预测。该数据集用于训练ProkBERT PhaStyle模型。

The PhageStyle-BACPHLIP dataset is designed to predict phage lifestyles, classifying them as either virulent or temperate. It is divided into training and validation sets, with the training set containing phages from various species excluding Escherichia coli, and the validation set exclusively consisting of Escherichia coli phages. The sequences are segmented into 512bp and 1022bp fragments to simulate real-world scenarios where metagenomic and viromic assemblies are often fragmented. This segmentation allows models to be trained and validated on different fragment lengths, enhancing their ability to generalize across different species and environments. The dataset is crucial for training machine learning models to make robust predictions for phage lifestyle classification, and it was used to train the ProkBERT PhaStyle model.
提供机构:
neuralbioinfo
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作