mahdibaghbanzadeh/BERTax_non_similar_dataset_phylum
收藏Hugging Face2024-04-08 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/mahdibaghbanzadeh/BERTax_non_similar_dataset_phylum
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: sequence
dtype: string
- name: phylum
dtype:
class_label:
names:
'0': Actinomycetota
'1': Apicomplexa
'2': Arthropoda
'3': Artverviricota
'4': Ascomycota
'5': Bacillariophyta
'6': Bacillota
'7': Bacteroidota
'8': Basidiomycota
'9': Bdellovibrionota
'10': Campylobacterota
'11': Candidatus Thermoplasmatota
'12': Chloroflexota
'13': Chordata
'14': Cyanobacteriota
'15': Deinococcota
'16': Euryarchaeota
'17': Kitrinoviricota
'18': Mollusca
'19': Mycoplasmatota
'20': Myxococcota
'21': Negarnaviricota
'22': Nitrososphaerota
'23': Peploviricota
'24': Pisuviricota
'25': Planctomycetota
'26': Pseudomonadota
'27': Rhodothermota
'28': Spirochaetota
'29': Streptophyta
'30': Thermodesulfobacteriota
'31': Thermodesulfobiota
'32': Thermoproteota
'33': Thermotogota
'34': Uroviricota
'35': Verrucomicrobiota
splits:
- name: train
num_bytes: 3386883024
num_examples: 2240002
- name: test
num_bytes: 80740800
num_examples: 53400
download_size: 1704006951
dataset_size: 3467623824
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
提供机构:
mahdibaghbanzadeh
原始信息汇总
数据集概述
数据集特征
- sequence:数据类型为字符串。
- phylum:数据类型为分类标签,包含以下类别:
- Actinomycetota
- Apicomplexa
- Arthropoda
- Artverviricota
- Ascomycota
- Bacillariophyta
- Bacillota
- Bacteroidota
- Basidiomycota
- Bdellovibrionota
- Campylobacterota
- Candidatus Thermoplasmatota
- Chloroflexota
- Chordata
- Cyanobacteriota
- Deinococcota
- Euryarchaeota
- Kitrinoviricota
- Mollusca
- Mycoplasmatota
- Myxococcota
- Negarnaviricota
- Nitrososphaerota
- Peploviricota
- Pisuviricota
- Planctomycetota
- Pseudomonadota
- Rhodothermota
- Spirochaetota
- Streptophyta
- Thermodesulfobacteriota
- Thermodesulfobiota
- Thermoproteota
- Thermotogota
- Uroviricota
- Verrucomicrobiota
数据集分割
- train:包含2240002个样本,总大小为3386883024字节。
- test:包含53400个样本,总大小为80740800字节。
数据集大小
- 下载大小:1704006951字节
- 数据集总大小:3467623824字节
配置文件
- default:
- 训练数据路径:
data/train-* - 测试数据路径:
data/test-*
- 训练数据路径:



