tydymy/150bp_multi_species_dataset
收藏Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tydymy/150bp_multi_species_dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: '#genome'
dtype: string
- name: asm_name
dtype: string
- name: assembly_accession
dtype: string
- name: bioproject
dtype: string
- name: biosample
dtype: string
- name: wgs_master
dtype: float64
- name: seq_rel_date
dtype: string
- name: submitter
dtype: string
- name: ftp_path
dtype: string
- name: img_id
dtype: float64
- name: gtdb_id
dtype: string
- name: scope
dtype: string
- name: assembly_level
dtype: string
- name: genome_rep
dtype: string
- name: refseq_category
dtype: string
- name: release_type
dtype: string
- name: taxid
dtype: float64
- name: species_taxid
dtype: float64
- name: organism_name
dtype: string
- name: infraspecific_name
dtype: string
- name: isolate
dtype: string
- name: superkingdom
dtype: string
- name: phylum
dtype: string
- name: class
dtype: string
- name: order
dtype: string
- name: family
dtype: string
- name: genus
dtype: string
- name: species
dtype: string
- name: classified
dtype: bool
- name: lv1_group
dtype: string
- name: lv2_group
dtype: string
- name: score_faa
dtype: float64
- name: score_fna
dtype: float64
- name: score_rrna
dtype: float64
- name: score_trna
dtype: float64
- name: total_length
dtype: float64
- name: contigs
dtype: float64
- name: gc
dtype: float64
- name: n50
dtype: float64
- name: l50
dtype: float64
- name: proteins
dtype: float64
- name: protein_length
dtype: float64
- name: coding_density
dtype: float64
- name: completeness
dtype: float64
- name: contamination
dtype: float64
- name: strain_heterogeneity
dtype: float64
- name: markers
dtype: float64
- name: 5s_rrna
dtype: string
- name: 16s_rrna
dtype: string
- name: 23s_rrna
dtype: string
- name: trnas
dtype: float64
- name: draft_quality
dtype: string
- name: start_position
dtype: int64
- name: human_label
dtype: int64
- name: autotrain_text
dtype: string
- name: autotrain_label
dtype:
class_label:
names:
'0': Acetobacter pasteurianus IFO 3283-01 IFO 3283 substr. IFO 3283-01
'1': Alcanivorax borkumensis SK2
'2': Aquifex aeolicus VF5
'3': Archaeoglobus fulgidus DSM 4304
'4': Azorhizobium caulinodans ORS 571
'5': Bacillus anthracis str. Ames
'6': Bacillus anthracis str. Sterne ASM816v1
'7': Bacillus cereus ATCC 14579
'8': Bacillus clausii KSM-K16
'9': Bacillus pseudofirmus OF4
'10': Bacteroides fragilis YCH46
'11': Bacteroides thetaiotaomicron VPI-5482
'12': Bifidobacterium adolescentis ATCC 15703
'13': Bifidobacterium longum NCC2705
'14': Borrelia burgdorferi B31
'15': Brevibacillus brevis NBRC 100599
'16': Buchnera aphidicola str. Bp (Baizongia pistaciae)
'17': Buchnera aphidicola str. Sg (Schizaphis graminum) Sg
'18': Caldanaerobacter subterraneus subsp. tengcongensis MB4
'19': Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2
'20': Candidatus Vesicomyosocius okutanii HA
'21': Chlamydia felis Fe/C-56
'22': Chlamydia trachomatis D/UW-3/CX
'23': Chlamydophila caviae GPIC
'24': Chlamydophila pneumoniae CWL029
'25': Chlamydophila pneumoniae TW-183
'26': Chlorobium tepidum TLS
'27': Chromobacterium violaceum ATCC 12472
'28': Clostridioides difficile 630 ASM920v1
'29': Clostridium acetobutylicum ATCC 824
'30': Clostridium tetani E88 Massachusetts substr. E88
'31': Corynebacterium jeikeium K411 K411 = NCTC 11915
'32': Coxiella burnetii RSA 493 ASM776v1
'33': Deferribacter desulfuricans SSM1
'34': Dehalococcoides mccartyi CBDB1
'35': Deinococcus radiodurans R1 ASM856v1
'36': Desulfovibrio magneticus RS-1
'37': Enterococcus faecalis V583 ASM778v1
'38': Escherichia coli O157:H7 str. Sakai Sakai substr. RIMD 0509952
'39': Finegoldia magna ATCC 29328
'40': Francisella tularensis subsp. holarctica LVS ASM924v1
'41': Fusobacterium nucleatum subsp. nucleatum ATCC 25586
'42': Gemmatimonas aurantiaca T-27
'43': Geobacter sulfurreducens PCA
'44': Haemophilus ducreyi 35000HP
'45': Haloquadratum walsbyi DSM 16790 DSM 16790 = HBSQ001
'46': Helicobacter acinonychis str. Sheeba
'47': Helicobacter hepaticus ATCC 51449
'48': Helicobacter pylori 26695 ASM852v1
'49': Hydrogenobacter thermophilus TK-6 ASM1078v1
'50': Idiomarina loihiensis L2TR
'51': Kocuria rhizophila DC2201
'52': Lactobacillus fermentum IFO 3956
'53': Lactobacillus salivarius UCC118
'54': Lactococcus lactis subsp. lactis Il1403 IL1403
'55': Macrococcus caseolyticus JCSC5402
'56': Magnetospirillum magneticum AMB-1
'57': Mannheimia succiniciproducens MBEL55E
'58': Methanocella paludicola SANAE
'59': Methanococcus voltae A3
'60': Methanopyrus kandleri AV19
'61': Methanosarcina acetivorans C2A
'62': Methanothermobacter thermautotrophicus str. Delta H
'63': Methylococcus capsulatus str. Bath
'64': Microcystis aeruginosa NIES-843
'65': Mycobacterium avium subsp. paratuberculosis K-10
'66': Neisseria gonorrhoeae FA 1090
'67': Neisseria meningitidis MC58
'68': Nitratiruptor sp. SB155-2 ASM1032v1
'69': Nitrosomonas europaea ATCC 19718
'70': Nostoc sp. PCC 7120 ASM970v1
'71': Onion yellows phytoplasma OY-M onion yellows
'72': Orientia tsutsugamushi str. Ikeda
'73': Pelotomaculum thermopropionicum SI
'74': Picrophilus torridus DSM 9790
'75': Porphyromonas gingivalis ATCC 33277
'76': Prochlorococcus marinus subsp. marinus str. CCMP1375
'77': Propionibacterium acnes KPA171202
'78': Pseudomonas putida KT2440
'79': Pyrobaculum aerophilum str. IM2
'80': Pyrococcus furiosus DSM 3638
'81': Ralstonia solanacearum GMI1000
'82': Rickettsia conorii str. Malish 7
'83': Rickettsia typhi str. Wilmington
'84': Rothia mucilaginosa DY-18
'85': Shigella flexneri 2a str. 301
'86': Sinorhizobium meliloti 1021
'87': Sodalis glossinidius str. 'morsitans' morsitans
'88': Staphylococcus epidermidis ATCC 12228 ASM764v1
'89': Staphylococcus haemolyticus JCSC1435
'90': Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 ASM1012v1
'91': Streptococcus agalactiae 2603V/R
'92': Streptococcus mutans UA159
'93': Streptococcus pyogenes M1 GAS SF370
'94': Streptococcus uberis 0140J
'95': Streptomyces avermitilis MA-4680 = NBRC 14893 MA-4680 ASM976v2
'96': Streptomyces griseus subsp. griseus NBRC 13350
'97': Sulfolobus solfataricus P2
'98': Sulfurovum sp. NBC37-1 ASM1034v1
'99': Symbiobacterium thermophilum IAM 14863 IAM14863
'100': Synechococcus elongatus PCC 6301
'101': Synechocystis sp. PCC 6803 ASM972v1
'102': Thermococcus kodakarensis KOD1
'103': Thermotoga maritima MSB8 ASM854v1
'104': Treponema denticola ATCC 35405
'105': Treponema pallidum subsp. pallidum str. Nichols ASM860v1
'106': Tropheryma whipplei str. Twist
'107': Vibrio cholerae O1 biovar El Tor str. N16961
'108': Vibrio vulnificus YJ016
'109': Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
'110': Wolbachia endosymbiont of Drosophila melanogaster wMel
'111': Wolbachia endosymbiont strain TRS of Brugia malayi
'112': Xanthomonas campestris pv. campestris str. ATCC 33913
'113': Xanthomonas oryzae pv. oryzae KACC 10331
'114': Xylella fastidiosa 9a5c
'115': Yersinia enterocolitica subsp. enterocolitica 8081
'116': Yersinia pestis CO92 ASM906v1
'117': Zymomonas mobilis subsp. mobilis ZM4 = ATCC 31821 ZM4
'118': '[Bacillus thuringiensis] serovar konkukian str. 97-27'
'119': '[Pseudomonas syringae] pv. tomato str. DC3000'
'120': homo sapiens
splits:
- name: train
num_bytes: 683959051
num_examples: 1000000
- name: validation
num_bytes: 68390921
num_examples: 100000
download_size: 158127793
dataset_size: 752349972
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
---
# Dataset Card for "autotrain-data-species_classify"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
该数据集包含多种生物信息学特征,如基因组信息、生物样本、序列发布日期、提交者等,并涉及多种微生物的分类。数据集被分为训练集和验证集,用于物种分类任务。每个样本都标有特定的微生物名称,适用于生物信息学研究和物种分类模型的训练。
提供机构:
tydymy
原始信息汇总
数据集概述
数据集信息
特征列表
#genome: 字符串asm_name: 字符串assembly_accession: 字符串bioproject: 字符串biosample: 字符串wgs_master: 浮点数seq_rel_date: 字符串submitter: 字符串ftp_path: 字符串img_id: 浮点数gtdb_id: 字符串scope: 字符串assembly_level: 字符串genome_rep: 字符串refseq_category: 字符串release_type: 字符串taxid: 浮点数species_taxid: 浮点数organism_name: 字符串infraspecific_name: 字符串isolate: 字符串superkingdom: 字符串phylum: 字符串class: 字符串order: 字符串family: 字符串genus: 字符串species: 字符串classified: 布尔值lv1_group: 字符串lv2_group: 字符串score_faa: 浮点数score_fna: 浮点数score_rrna: 浮点数score_trna: 浮点数total_length: 浮点数contigs: 浮点数gc: 浮点数n50: 浮点数l50: 浮点数proteins: 浮点数protein_length: 浮点数coding_density: 浮点数completeness: 浮点数contamination: 浮点数strain_heterogeneity: 浮点数markers: 浮点数5s_rrna: 字符串16s_rrna: 字符串23s_rrna: 字符串trnas: 浮点数draft_quality: 字符串start_position: 整数human_label: 整数autotrain_text: 字符串autotrain_label: 类别标签- 类别名称:
- 0: Acetobacter pasteurianus IFO 3283-01 IFO 3283 substr. IFO 3283-01
- 1: Alcanivorax borkumensis SK2
- 2: Aquifex aeolicus VF5
- 3: Archaeoglobus fulgidus DSM 4304
- 4: Azorhizobium caulinodans ORS 571
- 5: Bacillus anthracis str. Ames
- 6: Bacillus anthracis str. Sterne ASM816v1
- 7: Bacillus cereus ATCC 14579
- 8: Bacillus clausii KSM-K16
- 9: Bacillus pseudofirmus OF4
- 10: Bacteroides fragilis YCH46
- 11: Bacteroides thetaiotaomicron VPI-5482
- 12: Bifidobacterium adolescentis ATCC 15703
- 13: Bifidobacterium longum NCC2705
- 14: Borrelia burgdorferi B31
- 15: Brevibacillus brevis NBRC 100599
- 16: Buchnera aphidicola str. Bp (Baizongia pistaciae)
- 17: Buchnera aphidicola str. Sg (Schizaphis graminum) Sg
- 18: Caldanaerobacter subterraneus subsp. tengcongensis MB4
- 19: Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2
- 20: Candidatus Vesicomyosocius okutanii HA
- 21: Chlamydia felis Fe/C-56
- 22: Chlamydia trachomatis D/UW-3/CX
- 23: Chlamydophila caviae GPIC
- 24: Chlamydophila pneumoniae CWL029
- 25: Chlamydophila pneumoniae TW-183
- 26: Chlorobium tepidum TLS
- 27: Chromobacterium violaceum ATCC 12472
- 28: Clostridioides difficile 630 ASM920v1
- 29: Clostridium acetobutylicum ATCC 824
- 30: Clostridium tetani E88 Massachusetts substr. E88
- 31: Corynebacterium jeikeium K411 K411 = NCTC 11915
- 32: Coxiella burnetii RSA 493 ASM776v1
- 33: Deferribacter desulfuricans SSM1
- 34: Dehalococcoides mccartyi CBDB1
- 35: Deinococcus radiodurans R1 ASM856v1
- 36: Desulfovibrio magneticus RS-1
- 37: Enterococcus faecalis V583 ASM778v1
- 38: Escherichia coli O157:H7 str. Sakai Sakai substr. RIMD 0509952
- 39: Finegoldia magna ATCC 29328
- 40: Francisella tularensis subsp. holarctica LVS ASM924v1
- 41: Fusobacterium nucleatum subsp. nucleatum ATCC 25586
- 42: Gemmatimonas aurantiaca T-27
- 43: Geobacter sulfurreducens PCA
- 44: Haemophilus ducreyi 35000HP
- 45: Haloquadratum walsbyi DSM 16790 DSM 16790 = HBSQ001
- 46: Helicobacter acinonychis str. Sheeba
- 47: Helicobacter hepaticus ATCC 51449
- 48: Helicobacter pylori 26695 ASM852v1
- 49: Hydrogenobacter thermophilus TK-6 ASM1078v1
- 50: Idiomarina loihiensis L2TR
- 51: Kocuria rhizophila DC2201
- 52: Lactobacillus fermentum IFO 3956
- 53: Lactobacillus salivarius UCC118
- 54: Lactococcus lactis subsp. lactis Il1403 IL1403
- 55: Macrococcus caseolyticus JCSC5402
- 56: Magnetospirillum magneticum AMB-1
- 57: Mannheimia succiniciproducens MBEL55E
- 58: Methanocella paludicola SANAE
- 59: Methanococcus voltae A3
- 60: Methanopyrus kandleri AV19
- 61: Methanosarcina acetivorans C2A
- 62: Methanothermobacter thermautotrophicus str. Delta H
- 63: Methylococcus capsulatus str. Bath
- 64: Microcystis aeruginosa NIES-843
- 65: Mycobacterium avium subsp. paratuberculosis K-10
- 66: Neisseria gonorrhoeae FA 1090
- 67: Neisseria meningitidis MC58
- 68: Nitratiruptor sp. SB155-2 ASM1032v1
- 69: Nitrosomonas europaea ATCC 19718
- 70: Nostoc sp. PCC 7120 ASM970v1
- 71: Onion yellows phytoplasma OY-M onion yellows
- 72: Orientia tsutsugamushi str. Ikeda
- 73: Pelotomaculum thermopropionicum SI
- 74: Picrophilus torridus DSM 9790
- 75: Porphyromonas gingivalis ATCC 33277
- 76: Prochlorococcus marinus subsp. marinus str. CCMP1375
- 77: Propionibacterium acnes KPA171202
- 78: Pseudomonas putida KT2440
- 79: Pyrobaculum aerophilum str. IM2
- 80: Pyrococcus furiosus DSM 3638
- 81: Ralstonia solanacearum GMI1000
- 82: Rickettsia conorii str. Malish 7
- 83: Rickettsia typhi str. Wilmington
- 84: Rothia mucilaginosa DY-18
- 85: Shigella flexneri 2a str. 301
- 86: Sinorhizobium meliloti 1021
- 87: Sodalis glossinidius str. morsitans morsitans
- 88: Staphylococcus epidermidis ATCC 12228 ASM764v1
- 89: Staphylococcus haemolyticus JCSC1435
- 90: Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 ASM1012v1
- 91: Streptococcus agalactiae 2603V/R
- 92: Streptococcus mutans UA159
- 93: Streptococcus pyogenes M1 GAS SF370
- 94: Streptococcus uberis 0140J
- 95: Streptomyces avermitilis MA-4680 = NBRC 14893 MA-4680 ASM976v2
- 96: Streptomyces griseus subsp. griseus NBRC 13350
- 97: Sulfolobus solfataricus P2
- 98: Sulfurovum sp. NBC37-1 ASM1034v1
- 99: Symbiobacterium thermophilum IAM 14863 IAM14863
- 100: Synechococcus elongatus PCC 6301
- 101: Synechocystis sp. PCC 6803 ASM972v1
- 102: Thermococcus kodakarensis KOD1
- 103: Thermotoga maritima MSB8 ASM854v1
- 104: Treponema denticola ATCC 35405
- 105: Treponema pallidum subsp. pallidum str. Nichols ASM860v1
- 106: Tropheryma whipplei str. Twist
- 107: Vibrio cholerae O1 biovar El Tor str. N16961
- 108: Vibrio vulnificus YJ016
- 109: Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
- 110: Wolbachia endosymbiont of Drosophila melanogaster wMel
- 111: Wolbachia endosymbiont strain TRS of Brugia malayi
- 112: Xanthomonas campestris pv. campestris str. ATCC 33913
- 113: Xanthomonas oryzae pv. oryzae KACC 10331
- 114: Xylella fastidiosa 9a5c
- 115: Yersinia enterocolitica subsp. enterocolitica 8081
- 116: Yersinia pestis CO92 ASM906v1
- 117: Zymomonas mobilis subsp. mobilis ZM4 = ATCC 31821 ZM4
- 118: [Bacillus thuringiensis] serovar konkukian str. 97-27
- 119: [Pseudomonas syringae] pv. tomato str. DC3000
- 120: homo sapiens
- 类别名称:
数据集分割
train: 683959051 字节, 1000000 个样本validation: 68390921 字节, 100000 个样本
数据集大小
- 下载大小: 158127793 字节
- 数据集大小: 752349972 字节
配置信息
- 配置名称:
default - 数据文件路径:
train:data/train-*validation:data/validation-*
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



