AllTheBacteria/All-representative-ids
收藏Hugging Face2026-04-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/AllTheBacteria/All-representative-ids
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
---
List of sample identifiers from [ATB](https://allthebacteria.org), [GTDB](https://gtdb.ecogenomic.org), [MGnify](https://www.ebi.ac.uk/metagenomics), [SPIRE](https://spire.embl.de), [mOTUs](https://motus-db.org) and [HRGM](https://www.decodebiome.org/HRGM2/) datasets, deduplicated at varying levels of identity based on nucleotide sequence using [sketchlib](https://docs.rs/sketchlib/latest/sketchlib/) v0.2.4.
Sketching and distance calculation conducted with below commands
```
sketchlib inverted build -o ${outpref} -k 21 -s 10 -f ${infile} --threads 47 --write-skq
sketchlib sketch -f ${infile} -o ${outpref} --k-vals 21 -s 1000 --threads 47
sketchlib inverted precluster --knn 50 --skd ${outpref} --ani --threads 47 -o ${outpref}_dists.tsv ${outpref}.ski
```
Where `infile` is a set of genome inputs in the format:
```
GENOME000001 /path/to/GENOME000001.fasta
GENOME000002 /path/to/GENOME000002.fasta
GENOME000003 /path/to/GENOME000003.fasta
```
提供机构:
AllTheBacteria



