tydymy/150bp_human_vs_microbial_dna

Name: tydymy/150bp_human_vs_microbial_dna
Creator: tydymy
Published: 2023-11-15 16:58:35
License: 暂无描述

Hugging Face2023-11-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/tydymy/150bp_human_vs_microbial_dna

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: '#genome' dtype: string - name: asm_name dtype: string - name: assembly_accession dtype: string - name: bioproject dtype: string - name: biosample dtype: string - name: wgs_master dtype: float64 - name: seq_rel_date dtype: string - name: submitter dtype: string - name: ftp_path dtype: string - name: img_id dtype: float64 - name: gtdb_id dtype: string - name: scope dtype: string - name: assembly_level dtype: string - name: genome_rep dtype: string - name: refseq_category dtype: string - name: release_type dtype: string - name: taxid dtype: float64 - name: species_taxid dtype: float64 - name: organism_name dtype: string - name: infraspecific_name dtype: string - name: isolate dtype: string - name: superkingdom dtype: string - name: phylum dtype: string - name: class dtype: string - name: order dtype: string - name: family dtype: string - name: genus dtype: string - name: species dtype: string - name: classified dtype: bool - name: unique_name dtype: string - name: lv1_group dtype: string - name: lv2_group dtype: string - name: score_faa dtype: float64 - name: score_fna dtype: float64 - name: score_rrna dtype: float64 - name: score_trna dtype: float64 - name: total_length dtype: float64 - name: contigs dtype: float64 - name: gc dtype: float64 - name: n50 dtype: float64 - name: l50 dtype: float64 - name: proteins dtype: float64 - name: protein_length dtype: float64 - name: coding_density dtype: float64 - name: completeness dtype: float64 - name: contamination dtype: float64 - name: strain_heterogeneity dtype: float64 - name: markers dtype: float64 - name: 5s_rrna dtype: string - name: 16s_rrna dtype: string - name: 23s_rrna dtype: string - name: trnas dtype: float64 - name: draft_quality dtype: string - name: start_position dtype: int64 - name: autotrain_text dtype: string - name: autotrain_label dtype: class_label: names: '0': 0 '1': 1 splits: - name: train num_bytes: 70411052 num_examples: 100000 - name: validation num_bytes: 3528945 num_examples: 5000 download_size: 15423840 dataset_size: 73939997 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* --- # Dataset Card for "autotrain-data-human_dna_classify_150bp" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)

提供机构：

tydymy

原始信息汇总

数据集信息

特征

#genome: 字符串
asm_name: 字符串
assembly_accession: 字符串
bioproject: 字符串
biosample: 字符串
wgs_master: 浮点数
seq_rel_date: 字符串
submitter: 字符串
ftp_path: 字符串
img_id: 浮点数
gtdb_id: 字符串
scope: 字符串
assembly_level: 字符串
genome_rep: 字符串
refseq_category: 字符串
release_type: 字符串
taxid: 浮点数
species_taxid: 浮点数
organism_name: 字符串
infraspecific_name: 字符串
isolate: 字符串
superkingdom: 字符串
phylum: 字符串
class: 字符串
order: 字符串
family: 字符串
genus: 字符串
species: 字符串
classified: 布尔值
unique_name: 字符串
lv1_group: 字符串
lv2_group: 字符串
score_faa: 浮点数
score_fna: 浮点数
score_rrna: 浮点数
score_trna: 浮点数
total_length: 浮点数
contigs: 浮点数
gc: 浮点数
n50: 浮点数
l50: 浮点数
proteins: 浮点数
protein_length: 浮点数
coding_density: 浮点数
completeness: 浮点数
contamination: 浮点数
strain_heterogeneity: 浮点数
markers: 浮点数
5s_rrna: 字符串
16s_rrna: 字符串
23s_rrna: 字符串
trnas: 浮点数
draft_quality: 字符串
start_position: 整数
autotrain_text: 字符串
autotrain_label:
- 类别标签:
  - 0: 0
  - 1: 1

数据分割

train:
- 字节数: 70411052
- 样本数: 100000
validation:
- 字节数: 3528945
- 样本数: 5000

数据大小

下载大小: 15423840
数据集大小: 73939997

配置

default:
- 数据文件:
  - train: data/train-*
  - validation: data/validation-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集