gapseq reference sequence databases for Bacteria and Archaea
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10047603
下载链接
链接失效反馈官方服务:
资源简介:
The repository contains the protein sequences used by gapseq to predict the presence of metabolic reactions and to construct metabolic models.
The workflow using gapseq to generate this set of reference protein sequences:
```sh
# delete all "old" datarm dat/seq/Bacteria/rev/*.fastarm dat/seq/Bacteria/unrev/*.fastarm dat/seq/Bacteria/rxn/*.fastarm dat/seq/Archaea/rev/*.fastarm dat/seq/Archaea/unrev/*.fastarm dat/seq/Archaea/rxn/*.fasta
# run gapseq find to re-download everything## the genome is irrelevant as no blasting is performed ('-x')gapseq find -p all -t Bacteria -n -x -U toy/ecoli.faa.gz > bac_update.log 2>&1gapseq find -p all -t Archaea -n -x -U toy/ecoli.faa.gz > ar_update.log 2>&1
# create all sequence .tar.gz archives (rev/unrev/rxn)cd dat/seq/Bacteria/rev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../cd dat/seq/Bacteria/unrev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../cd dat/seq/Bacteria/rxn/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../cd dat/seq/Archaea/rev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../cd dat/seq/Archaea/unrev/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../cd dat/seq/Archaea/rxn/ && tar -czvf sequences.tar.gz ./*.fasta && cd ../../../../
# create md5sum table for all tar.gz archivescd dat/seq/find -mindepth 2 -type f -name "*.tar.gz" -exec md5sum {} \; > md5sums.txt
# create taxon-specific final archive for Zenodo uploadtar -czvf Bacteria.tar.gz Bacteria/*/*.tar.gztar -czvf Archaea.tar.gz Archaea/*/*.tar.gz
# Upload Bacteria.tar.gz, Archaea.tar.gz, and md5sums.txt to Zenodo via the web-interface
```
创建时间:
2024-11-06



