tydymy/150bp_human_vs_microbial_dna
收藏Hugging Face2023-11-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tydymy/150bp_human_vs_microbial_dna
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: '#genome'
dtype: string
- name: asm_name
dtype: string
- name: assembly_accession
dtype: string
- name: bioproject
dtype: string
- name: biosample
dtype: string
- name: wgs_master
dtype: float64
- name: seq_rel_date
dtype: string
- name: submitter
dtype: string
- name: ftp_path
dtype: string
- name: img_id
dtype: float64
- name: gtdb_id
dtype: string
- name: scope
dtype: string
- name: assembly_level
dtype: string
- name: genome_rep
dtype: string
- name: refseq_category
dtype: string
- name: release_type
dtype: string
- name: taxid
dtype: float64
- name: species_taxid
dtype: float64
- name: organism_name
dtype: string
- name: infraspecific_name
dtype: string
- name: isolate
dtype: string
- name: superkingdom
dtype: string
- name: phylum
dtype: string
- name: class
dtype: string
- name: order
dtype: string
- name: family
dtype: string
- name: genus
dtype: string
- name: species
dtype: string
- name: classified
dtype: bool
- name: unique_name
dtype: string
- name: lv1_group
dtype: string
- name: lv2_group
dtype: string
- name: score_faa
dtype: float64
- name: score_fna
dtype: float64
- name: score_rrna
dtype: float64
- name: score_trna
dtype: float64
- name: total_length
dtype: float64
- name: contigs
dtype: float64
- name: gc
dtype: float64
- name: n50
dtype: float64
- name: l50
dtype: float64
- name: proteins
dtype: float64
- name: protein_length
dtype: float64
- name: coding_density
dtype: float64
- name: completeness
dtype: float64
- name: contamination
dtype: float64
- name: strain_heterogeneity
dtype: float64
- name: markers
dtype: float64
- name: 5s_rrna
dtype: string
- name: 16s_rrna
dtype: string
- name: 23s_rrna
dtype: string
- name: trnas
dtype: float64
- name: draft_quality
dtype: string
- name: start_position
dtype: int64
- name: autotrain_text
dtype: string
- name: autotrain_label
dtype:
class_label:
names:
'0': 0
'1': 1
splits:
- name: train
num_bytes: 70411052
num_examples: 100000
- name: validation
num_bytes: 3528945
num_examples: 5000
download_size: 15423840
dataset_size: 73939997
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
---
# Dataset Card for "autotrain-data-human_dna_classify_150bp"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
tydymy
原始信息汇总
数据集信息
特征
#genome: 字符串asm_name: 字符串assembly_accession: 字符串bioproject: 字符串biosample: 字符串wgs_master: 浮点数seq_rel_date: 字符串submitter: 字符串ftp_path: 字符串img_id: 浮点数gtdb_id: 字符串scope: 字符串assembly_level: 字符串genome_rep: 字符串refseq_category: 字符串release_type: 字符串taxid: 浮点数species_taxid: 浮点数organism_name: 字符串infraspecific_name: 字符串isolate: 字符串superkingdom: 字符串phylum: 字符串class: 字符串order: 字符串family: 字符串genus: 字符串species: 字符串classified: 布尔值unique_name: 字符串lv1_group: 字符串lv2_group: 字符串score_faa: 浮点数score_fna: 浮点数score_rrna: 浮点数score_trna: 浮点数total_length: 浮点数contigs: 浮点数gc: 浮点数n50: 浮点数l50: 浮点数proteins: 浮点数protein_length: 浮点数coding_density: 浮点数completeness: 浮点数contamination: 浮点数strain_heterogeneity: 浮点数markers: 浮点数5s_rrna: 字符串16s_rrna: 字符串23s_rrna: 字符串trnas: 浮点数draft_quality: 字符串start_position: 整数autotrain_text: 字符串autotrain_label:- 类别标签:
- 0: 0
- 1: 1
- 类别标签:
数据分割
train:- 字节数: 70411052
- 样本数: 100000
validation:- 字节数: 3528945
- 样本数: 5000
数据大小
- 下载大小: 15423840
- 数据集大小: 73939997
配置
default:- 数据文件:
train: data/train-*validation: data/validation-*
- 数据文件:



