five

tydymy/150bp_multi_species_dataset

收藏
Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tydymy/150bp_multi_species_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: '#genome' dtype: string - name: asm_name dtype: string - name: assembly_accession dtype: string - name: bioproject dtype: string - name: biosample dtype: string - name: wgs_master dtype: float64 - name: seq_rel_date dtype: string - name: submitter dtype: string - name: ftp_path dtype: string - name: img_id dtype: float64 - name: gtdb_id dtype: string - name: scope dtype: string - name: assembly_level dtype: string - name: genome_rep dtype: string - name: refseq_category dtype: string - name: release_type dtype: string - name: taxid dtype: float64 - name: species_taxid dtype: float64 - name: organism_name dtype: string - name: infraspecific_name dtype: string - name: isolate dtype: string - name: superkingdom dtype: string - name: phylum dtype: string - name: class dtype: string - name: order dtype: string - name: family dtype: string - name: genus dtype: string - name: species dtype: string - name: classified dtype: bool - name: lv1_group dtype: string - name: lv2_group dtype: string - name: score_faa dtype: float64 - name: score_fna dtype: float64 - name: score_rrna dtype: float64 - name: score_trna dtype: float64 - name: total_length dtype: float64 - name: contigs dtype: float64 - name: gc dtype: float64 - name: n50 dtype: float64 - name: l50 dtype: float64 - name: proteins dtype: float64 - name: protein_length dtype: float64 - name: coding_density dtype: float64 - name: completeness dtype: float64 - name: contamination dtype: float64 - name: strain_heterogeneity dtype: float64 - name: markers dtype: float64 - name: 5s_rrna dtype: string - name: 16s_rrna dtype: string - name: 23s_rrna dtype: string - name: trnas dtype: float64 - name: draft_quality dtype: string - name: start_position dtype: int64 - name: human_label dtype: int64 - name: autotrain_text dtype: string - name: autotrain_label dtype: class_label: names: '0': Acetobacter pasteurianus IFO 3283-01 IFO 3283 substr. IFO 3283-01 '1': Alcanivorax borkumensis SK2 '2': Aquifex aeolicus VF5 '3': Archaeoglobus fulgidus DSM 4304 '4': Azorhizobium caulinodans ORS 571 '5': Bacillus anthracis str. Ames '6': Bacillus anthracis str. Sterne ASM816v1 '7': Bacillus cereus ATCC 14579 '8': Bacillus clausii KSM-K16 '9': Bacillus pseudofirmus OF4 '10': Bacteroides fragilis YCH46 '11': Bacteroides thetaiotaomicron VPI-5482 '12': Bifidobacterium adolescentis ATCC 15703 '13': Bifidobacterium longum NCC2705 '14': Borrelia burgdorferi B31 '15': Brevibacillus brevis NBRC 100599 '16': Buchnera aphidicola str. Bp (Baizongia pistaciae) '17': Buchnera aphidicola str. Sg (Schizaphis graminum) Sg '18': Caldanaerobacter subterraneus subsp. tengcongensis MB4 '19': Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2 '20': Candidatus Vesicomyosocius okutanii HA '21': Chlamydia felis Fe/C-56 '22': Chlamydia trachomatis D/UW-3/CX '23': Chlamydophila caviae GPIC '24': Chlamydophila pneumoniae CWL029 '25': Chlamydophila pneumoniae TW-183 '26': Chlorobium tepidum TLS '27': Chromobacterium violaceum ATCC 12472 '28': Clostridioides difficile 630 ASM920v1 '29': Clostridium acetobutylicum ATCC 824 '30': Clostridium tetani E88 Massachusetts substr. E88 '31': Corynebacterium jeikeium K411 K411 = NCTC 11915 '32': Coxiella burnetii RSA 493 ASM776v1 '33': Deferribacter desulfuricans SSM1 '34': Dehalococcoides mccartyi CBDB1 '35': Deinococcus radiodurans R1 ASM856v1 '36': Desulfovibrio magneticus RS-1 '37': Enterococcus faecalis V583 ASM778v1 '38': Escherichia coli O157:H7 str. Sakai Sakai substr. RIMD 0509952 '39': Finegoldia magna ATCC 29328 '40': Francisella tularensis subsp. holarctica LVS ASM924v1 '41': Fusobacterium nucleatum subsp. nucleatum ATCC 25586 '42': Gemmatimonas aurantiaca T-27 '43': Geobacter sulfurreducens PCA '44': Haemophilus ducreyi 35000HP '45': Haloquadratum walsbyi DSM 16790 DSM 16790 = HBSQ001 '46': Helicobacter acinonychis str. Sheeba '47': Helicobacter hepaticus ATCC 51449 '48': Helicobacter pylori 26695 ASM852v1 '49': Hydrogenobacter thermophilus TK-6 ASM1078v1 '50': Idiomarina loihiensis L2TR '51': Kocuria rhizophila DC2201 '52': Lactobacillus fermentum IFO 3956 '53': Lactobacillus salivarius UCC118 '54': Lactococcus lactis subsp. lactis Il1403 IL1403 '55': Macrococcus caseolyticus JCSC5402 '56': Magnetospirillum magneticum AMB-1 '57': Mannheimia succiniciproducens MBEL55E '58': Methanocella paludicola SANAE '59': Methanococcus voltae A3 '60': Methanopyrus kandleri AV19 '61': Methanosarcina acetivorans C2A '62': Methanothermobacter thermautotrophicus str. Delta H '63': Methylococcus capsulatus str. Bath '64': Microcystis aeruginosa NIES-843 '65': Mycobacterium avium subsp. paratuberculosis K-10 '66': Neisseria gonorrhoeae FA 1090 '67': Neisseria meningitidis MC58 '68': Nitratiruptor sp. SB155-2 ASM1032v1 '69': Nitrosomonas europaea ATCC 19718 '70': Nostoc sp. PCC 7120 ASM970v1 '71': Onion yellows phytoplasma OY-M onion yellows '72': Orientia tsutsugamushi str. Ikeda '73': Pelotomaculum thermopropionicum SI '74': Picrophilus torridus DSM 9790 '75': Porphyromonas gingivalis ATCC 33277 '76': Prochlorococcus marinus subsp. marinus str. CCMP1375 '77': Propionibacterium acnes KPA171202 '78': Pseudomonas putida KT2440 '79': Pyrobaculum aerophilum str. IM2 '80': Pyrococcus furiosus DSM 3638 '81': Ralstonia solanacearum GMI1000 '82': Rickettsia conorii str. Malish 7 '83': Rickettsia typhi str. Wilmington '84': Rothia mucilaginosa DY-18 '85': Shigella flexneri 2a str. 301 '86': Sinorhizobium meliloti 1021 '87': Sodalis glossinidius str. 'morsitans' morsitans '88': Staphylococcus epidermidis ATCC 12228 ASM764v1 '89': Staphylococcus haemolyticus JCSC1435 '90': Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 ASM1012v1 '91': Streptococcus agalactiae 2603V/R '92': Streptococcus mutans UA159 '93': Streptococcus pyogenes M1 GAS SF370 '94': Streptococcus uberis 0140J '95': Streptomyces avermitilis MA-4680 = NBRC 14893 MA-4680 ASM976v2 '96': Streptomyces griseus subsp. griseus NBRC 13350 '97': Sulfolobus solfataricus P2 '98': Sulfurovum sp. NBC37-1 ASM1034v1 '99': Symbiobacterium thermophilum IAM 14863 IAM14863 '100': Synechococcus elongatus PCC 6301 '101': Synechocystis sp. PCC 6803 ASM972v1 '102': Thermococcus kodakarensis KOD1 '103': Thermotoga maritima MSB8 ASM854v1 '104': Treponema denticola ATCC 35405 '105': Treponema pallidum subsp. pallidum str. Nichols ASM860v1 '106': Tropheryma whipplei str. Twist '107': Vibrio cholerae O1 biovar El Tor str. N16961 '108': Vibrio vulnificus YJ016 '109': Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis '110': Wolbachia endosymbiont of Drosophila melanogaster wMel '111': Wolbachia endosymbiont strain TRS of Brugia malayi '112': Xanthomonas campestris pv. campestris str. ATCC 33913 '113': Xanthomonas oryzae pv. oryzae KACC 10331 '114': Xylella fastidiosa 9a5c '115': Yersinia enterocolitica subsp. enterocolitica 8081 '116': Yersinia pestis CO92 ASM906v1 '117': Zymomonas mobilis subsp. mobilis ZM4 = ATCC 31821 ZM4 '118': '[Bacillus thuringiensis] serovar konkukian str. 97-27' '119': '[Pseudomonas syringae] pv. tomato str. DC3000' '120': homo sapiens splits: - name: train num_bytes: 683959051 num_examples: 1000000 - name: validation num_bytes: 68390921 num_examples: 100000 download_size: 158127793 dataset_size: 752349972 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* --- # Dataset Card for "autotrain-data-species_classify" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

该数据集包含多种生物信息学特征,如基因组信息、生物样本、序列发布日期、提交者等,并涉及多种微生物的分类。数据集被分为训练集和验证集,用于物种分类任务。每个样本都标有特定的微生物名称,适用于生物信息学研究和物种分类模型的训练。
提供机构:
tydymy
原始信息汇总

数据集概述

数据集信息

特征列表

  • #genome: 字符串
  • asm_name: 字符串
  • assembly_accession: 字符串
  • bioproject: 字符串
  • biosample: 字符串
  • wgs_master: 浮点数
  • seq_rel_date: 字符串
  • submitter: 字符串
  • ftp_path: 字符串
  • img_id: 浮点数
  • gtdb_id: 字符串
  • scope: 字符串
  • assembly_level: 字符串
  • genome_rep: 字符串
  • refseq_category: 字符串
  • release_type: 字符串
  • taxid: 浮点数
  • species_taxid: 浮点数
  • organism_name: 字符串
  • infraspecific_name: 字符串
  • isolate: 字符串
  • superkingdom: 字符串
  • phylum: 字符串
  • class: 字符串
  • order: 字符串
  • family: 字符串
  • genus: 字符串
  • species: 字符串
  • classified: 布尔值
  • lv1_group: 字符串
  • lv2_group: 字符串
  • score_faa: 浮点数
  • score_fna: 浮点数
  • score_rrna: 浮点数
  • score_trna: 浮点数
  • total_length: 浮点数
  • contigs: 浮点数
  • gc: 浮点数
  • n50: 浮点数
  • l50: 浮点数
  • proteins: 浮点数
  • protein_length: 浮点数
  • coding_density: 浮点数
  • completeness: 浮点数
  • contamination: 浮点数
  • strain_heterogeneity: 浮点数
  • markers: 浮点数
  • 5s_rrna: 字符串
  • 16s_rrna: 字符串
  • 23s_rrna: 字符串
  • trnas: 浮点数
  • draft_quality: 字符串
  • start_position: 整数
  • human_label: 整数
  • autotrain_text: 字符串
  • autotrain_label: 类别标签
    • 类别名称:
      • 0: Acetobacter pasteurianus IFO 3283-01 IFO 3283 substr. IFO 3283-01
      • 1: Alcanivorax borkumensis SK2
      • 2: Aquifex aeolicus VF5
      • 3: Archaeoglobus fulgidus DSM 4304
      • 4: Azorhizobium caulinodans ORS 571
      • 5: Bacillus anthracis str. Ames
      • 6: Bacillus anthracis str. Sterne ASM816v1
      • 7: Bacillus cereus ATCC 14579
      • 8: Bacillus clausii KSM-K16
      • 9: Bacillus pseudofirmus OF4
      • 10: Bacteroides fragilis YCH46
      • 11: Bacteroides thetaiotaomicron VPI-5482
      • 12: Bifidobacterium adolescentis ATCC 15703
      • 13: Bifidobacterium longum NCC2705
      • 14: Borrelia burgdorferi B31
      • 15: Brevibacillus brevis NBRC 100599
      • 16: Buchnera aphidicola str. Bp (Baizongia pistaciae)
      • 17: Buchnera aphidicola str. Sg (Schizaphis graminum) Sg
      • 18: Caldanaerobacter subterraneus subsp. tengcongensis MB4
      • 19: Candidatus Azobacteroides pseudotrichonymphae genomovar. CFP2
      • 20: Candidatus Vesicomyosocius okutanii HA
      • 21: Chlamydia felis Fe/C-56
      • 22: Chlamydia trachomatis D/UW-3/CX
      • 23: Chlamydophila caviae GPIC
      • 24: Chlamydophila pneumoniae CWL029
      • 25: Chlamydophila pneumoniae TW-183
      • 26: Chlorobium tepidum TLS
      • 27: Chromobacterium violaceum ATCC 12472
      • 28: Clostridioides difficile 630 ASM920v1
      • 29: Clostridium acetobutylicum ATCC 824
      • 30: Clostridium tetani E88 Massachusetts substr. E88
      • 31: Corynebacterium jeikeium K411 K411 = NCTC 11915
      • 32: Coxiella burnetii RSA 493 ASM776v1
      • 33: Deferribacter desulfuricans SSM1
      • 34: Dehalococcoides mccartyi CBDB1
      • 35: Deinococcus radiodurans R1 ASM856v1
      • 36: Desulfovibrio magneticus RS-1
      • 37: Enterococcus faecalis V583 ASM778v1
      • 38: Escherichia coli O157:H7 str. Sakai Sakai substr. RIMD 0509952
      • 39: Finegoldia magna ATCC 29328
      • 40: Francisella tularensis subsp. holarctica LVS ASM924v1
      • 41: Fusobacterium nucleatum subsp. nucleatum ATCC 25586
      • 42: Gemmatimonas aurantiaca T-27
      • 43: Geobacter sulfurreducens PCA
      • 44: Haemophilus ducreyi 35000HP
      • 45: Haloquadratum walsbyi DSM 16790 DSM 16790 = HBSQ001
      • 46: Helicobacter acinonychis str. Sheeba
      • 47: Helicobacter hepaticus ATCC 51449
      • 48: Helicobacter pylori 26695 ASM852v1
      • 49: Hydrogenobacter thermophilus TK-6 ASM1078v1
      • 50: Idiomarina loihiensis L2TR
      • 51: Kocuria rhizophila DC2201
      • 52: Lactobacillus fermentum IFO 3956
      • 53: Lactobacillus salivarius UCC118
      • 54: Lactococcus lactis subsp. lactis Il1403 IL1403
      • 55: Macrococcus caseolyticus JCSC5402
      • 56: Magnetospirillum magneticum AMB-1
      • 57: Mannheimia succiniciproducens MBEL55E
      • 58: Methanocella paludicola SANAE
      • 59: Methanococcus voltae A3
      • 60: Methanopyrus kandleri AV19
      • 61: Methanosarcina acetivorans C2A
      • 62: Methanothermobacter thermautotrophicus str. Delta H
      • 63: Methylococcus capsulatus str. Bath
      • 64: Microcystis aeruginosa NIES-843
      • 65: Mycobacterium avium subsp. paratuberculosis K-10
      • 66: Neisseria gonorrhoeae FA 1090
      • 67: Neisseria meningitidis MC58
      • 68: Nitratiruptor sp. SB155-2 ASM1032v1
      • 69: Nitrosomonas europaea ATCC 19718
      • 70: Nostoc sp. PCC 7120 ASM970v1
      • 71: Onion yellows phytoplasma OY-M onion yellows
      • 72: Orientia tsutsugamushi str. Ikeda
      • 73: Pelotomaculum thermopropionicum SI
      • 74: Picrophilus torridus DSM 9790
      • 75: Porphyromonas gingivalis ATCC 33277
      • 76: Prochlorococcus marinus subsp. marinus str. CCMP1375
      • 77: Propionibacterium acnes KPA171202
      • 78: Pseudomonas putida KT2440
      • 79: Pyrobaculum aerophilum str. IM2
      • 80: Pyrococcus furiosus DSM 3638
      • 81: Ralstonia solanacearum GMI1000
      • 82: Rickettsia conorii str. Malish 7
      • 83: Rickettsia typhi str. Wilmington
      • 84: Rothia mucilaginosa DY-18
      • 85: Shigella flexneri 2a str. 301
      • 86: Sinorhizobium meliloti 1021
      • 87: Sodalis glossinidius str. morsitans morsitans
      • 88: Staphylococcus epidermidis ATCC 12228 ASM764v1
      • 89: Staphylococcus haemolyticus JCSC1435
      • 90: Staphylococcus saprophyticus subsp. saprophyticus ATCC 15305 ASM1012v1
      • 91: Streptococcus agalactiae 2603V/R
      • 92: Streptococcus mutans UA159
      • 93: Streptococcus pyogenes M1 GAS SF370
      • 94: Streptococcus uberis 0140J
      • 95: Streptomyces avermitilis MA-4680 = NBRC 14893 MA-4680 ASM976v2
      • 96: Streptomyces griseus subsp. griseus NBRC 13350
      • 97: Sulfolobus solfataricus P2
      • 98: Sulfurovum sp. NBC37-1 ASM1034v1
      • 99: Symbiobacterium thermophilum IAM 14863 IAM14863
      • 100: Synechococcus elongatus PCC 6301
      • 101: Synechocystis sp. PCC 6803 ASM972v1
      • 102: Thermococcus kodakarensis KOD1
      • 103: Thermotoga maritima MSB8 ASM854v1
      • 104: Treponema denticola ATCC 35405
      • 105: Treponema pallidum subsp. pallidum str. Nichols ASM860v1
      • 106: Tropheryma whipplei str. Twist
      • 107: Vibrio cholerae O1 biovar El Tor str. N16961
      • 108: Vibrio vulnificus YJ016
      • 109: Wigglesworthia glossinidia endosymbiont of Glossina brevipalpis
      • 110: Wolbachia endosymbiont of Drosophila melanogaster wMel
      • 111: Wolbachia endosymbiont strain TRS of Brugia malayi
      • 112: Xanthomonas campestris pv. campestris str. ATCC 33913
      • 113: Xanthomonas oryzae pv. oryzae KACC 10331
      • 114: Xylella fastidiosa 9a5c
      • 115: Yersinia enterocolitica subsp. enterocolitica 8081
      • 116: Yersinia pestis CO92 ASM906v1
      • 117: Zymomonas mobilis subsp. mobilis ZM4 = ATCC 31821 ZM4
      • 118: [Bacillus thuringiensis] serovar konkukian str. 97-27
      • 119: [Pseudomonas syringae] pv. tomato str. DC3000
      • 120: homo sapiens

数据集分割

  • train: 683959051 字节, 1000000 个样本
  • validation: 68390921 字节, 100000 个样本

数据集大小

  • 下载大小: 158127793 字节
  • 数据集大小: 752349972 字节

配置信息

  • 配置名称: default
  • 数据文件路径:
    • train: data/train-*
    • validation: data/validation-*
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作