jirvine/doric_from_plsdb_replicon_split
收藏Hugging Face2024-06-27 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/jirvine/doric_from_plsdb_replicon_split
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含与质粒复制相关的序列信息,主要特征包括OriC序列、ori_id、plasmid_id、物种、来源、pfamid_fast、rep_id、Rep_type_fast、rep_seq、rep_dna_seq、full_replicon_seq、split和__index_level_0__。数据集分为训练集和验证集,训练集包含925个样本,验证集包含284个样本。
This dataset contains sequence information related to plasmid replication, with main features including OriC sequence, ori_id, plasmid_id, species, source, pfamid_fast, rep_id, Rep_type_fast, rep_seq, rep_dna_seq, full_replicon_seq, split, and __index_level_0__. The dataset is divided into training and validation sets, with the training set containing 925 samples and the validation set containing 284 samples.
提供机构:
jirvine
原始信息汇总
数据集概述
数据集信息
特征
- OriC sequence: 类型为字符串
- ori_id: 类型为字符串
- plasmid_id: 类型为字符串
- species: 类型为字符串
- source: 类型为字符串
- pfamid_fast: 类型为字符串
- rep_id: 类型为字符串
- Rep_type_fast: 类型为字符串
- rep_seq: 类型为字符串
- rep_dna_seq: 类型为字符串
- full_replicon_seq: 类型为字符串
- split: 类型为字符串
- index_level_0: 类型为int64
数据分割
- train: 包含925个样本,大小为4983259字节
- validation: 包含284个样本,大小为1457890字节
数据集大小
- 下载大小: 2501767字节
- 总大小: 6441149字节
配置
- config_name: default
- 数据文件:
- train: 路径为
data/train-* - validation: 路径为
data/validation-*
- train: 路径为
- 数据文件:



