five

tydymy/150bp_human_vs_microbial_dna

收藏
Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tydymy/150bp_human_vs_microbial_dna
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: '#genome' dtype: string - name: asm_name dtype: string - name: assembly_accession dtype: string - name: bioproject dtype: string - name: biosample dtype: string - name: wgs_master dtype: float64 - name: seq_rel_date dtype: string - name: submitter dtype: string - name: ftp_path dtype: string - name: img_id dtype: float64 - name: gtdb_id dtype: string - name: scope dtype: string - name: assembly_level dtype: string - name: genome_rep dtype: string - name: refseq_category dtype: string - name: release_type dtype: string - name: taxid dtype: float64 - name: species_taxid dtype: float64 - name: organism_name dtype: string - name: infraspecific_name dtype: string - name: isolate dtype: string - name: superkingdom dtype: string - name: phylum dtype: string - name: class dtype: string - name: order dtype: string - name: family dtype: string - name: genus dtype: string - name: species dtype: string - name: classified dtype: bool - name: unique_name dtype: string - name: lv1_group dtype: string - name: lv2_group dtype: string - name: score_faa dtype: float64 - name: score_fna dtype: float64 - name: score_rrna dtype: float64 - name: score_trna dtype: float64 - name: total_length dtype: float64 - name: contigs dtype: float64 - name: gc dtype: float64 - name: n50 dtype: float64 - name: l50 dtype: float64 - name: proteins dtype: float64 - name: protein_length dtype: float64 - name: coding_density dtype: float64 - name: completeness dtype: float64 - name: contamination dtype: float64 - name: strain_heterogeneity dtype: float64 - name: markers dtype: float64 - name: 5s_rrna dtype: string - name: 16s_rrna dtype: string - name: 23s_rrna dtype: string - name: trnas dtype: float64 - name: draft_quality dtype: string - name: start_position dtype: int64 - name: autotrain_text dtype: string - name: autotrain_label dtype: class_label: names: '0': 0 '1': 1 splits: - name: train num_bytes: 70411052 num_examples: 100000 - name: validation num_bytes: 3528945 num_examples: 5000 download_size: 15423840 dataset_size: 73939997 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* --- # Dataset Card for "autotrain-data-human_dna_classify_150bp" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
tydymy
原始信息汇总

数据集信息

特征

  • #genome: 字符串
  • asm_name: 字符串
  • assembly_accession: 字符串
  • bioproject: 字符串
  • biosample: 字符串
  • wgs_master: 浮点数
  • seq_rel_date: 字符串
  • submitter: 字符串
  • ftp_path: 字符串
  • img_id: 浮点数
  • gtdb_id: 字符串
  • scope: 字符串
  • assembly_level: 字符串
  • genome_rep: 字符串
  • refseq_category: 字符串
  • release_type: 字符串
  • taxid: 浮点数
  • species_taxid: 浮点数
  • organism_name: 字符串
  • infraspecific_name: 字符串
  • isolate: 字符串
  • superkingdom: 字符串
  • phylum: 字符串
  • class: 字符串
  • order: 字符串
  • family: 字符串
  • genus: 字符串
  • species: 字符串
  • classified: 布尔值
  • unique_name: 字符串
  • lv1_group: 字符串
  • lv2_group: 字符串
  • score_faa: 浮点数
  • score_fna: 浮点数
  • score_rrna: 浮点数
  • score_trna: 浮点数
  • total_length: 浮点数
  • contigs: 浮点数
  • gc: 浮点数
  • n50: 浮点数
  • l50: 浮点数
  • proteins: 浮点数
  • protein_length: 浮点数
  • coding_density: 浮点数
  • completeness: 浮点数
  • contamination: 浮点数
  • strain_heterogeneity: 浮点数
  • markers: 浮点数
  • 5s_rrna: 字符串
  • 16s_rrna: 字符串
  • 23s_rrna: 字符串
  • trnas: 浮点数
  • draft_quality: 字符串
  • start_position: 整数
  • autotrain_text: 字符串
  • autotrain_label:
    • 类别标签:
      • 0: 0
      • 1: 1

数据分割

  • train:
    • 字节数: 70411052
    • 样本数: 100000
  • validation:
    • 字节数: 3528945
    • 样本数: 5000

数据大小

  • 下载大小: 15423840
  • 数据集大小: 73939997

配置

  • default:
    • 数据文件:
      • train: data/train-*
      • validation: data/validation-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作