five

AllTheBacteria/All-representative-ids

收藏
Hugging Face2026-04-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AllTheBacteria/All-representative-ids
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit --- List of sample identifiers from [ATB](https://allthebacteria.org), [GTDB](https://gtdb.ecogenomic.org), [MGnify](https://www.ebi.ac.uk/metagenomics), [SPIRE](https://spire.embl.de), [mOTUs](https://motus-db.org) and [HRGM](https://www.decodebiome.org/HRGM2/) datasets, deduplicated at varying levels of identity based on nucleotide sequence using [sketchlib](https://docs.rs/sketchlib/latest/sketchlib/) v0.2.4. Sketching and distance calculation conducted with below commands ``` sketchlib inverted build -o ${outpref} -k 21 -s 10 -f ${infile} --threads 47 --write-skq sketchlib sketch -f ${infile} -o ${outpref} --k-vals 21 -s 1000 --threads 47 sketchlib inverted precluster --knn 50 --skd ${outpref} --ani --threads 47 -o ${outpref}_dists.tsv ${outpref}.ski ``` Where `infile` is a set of genome inputs in the format: ``` GENOME000001 /path/to/GENOME000001.fasta GENOME000002 /path/to/GENOME000002.fasta GENOME000003 /path/to/GENOME000003.fasta ```
提供机构:
AllTheBacteria
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作