The North Pacific Eukaryotic Gene Catalog: KOfam protein function annotations
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13743266
下载链接
链接失效反馈官方服务:
资源简介:
KEGG functional annotation using KofamScan v1.3.0
These tables are larger alternative versions to the KOfam tables included in the North Pacific Eukaryotic Gene Catalog protein data repository here: https://zenodo.org/records/12630398A full description of this data is published in Scientific Data, available here: The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Please cite this publication if your research uses this data:Groussman, R. D., Coesel, S. N., Durham, B. P., Schatz, M. J., & Armbrust, E. V. (2024). The North Pacific Eukaryotic Gene Catalog of metatranscriptome assemblies and annotations. Scientific Data, 11(1), 1161.Clustered protein sequences were annotated against the KEGG collection (release 104.0) of 20,819 protein family Hidden Markov Models (HMMs) using KofamScan and KofamKOALA. Kofam annotation code is documented in the project github repository here: NPEGC.kofamscan_function.sh
Excerpt of core NPEGC_kofam function:
# Define input FASTAlocal INPUT_FASTA="NPac.${STUDY}.bf100.id99.aa.fasta"
# KofamScan call${KOFAM_DIR}/kofam_scan-1.3.0/exec_annotation -f detail-tsv -E ${EVALUE} -o ${ANNOTATION_DIR}/NPac.${STUDY}.bf100.id99.aa.tsv ${FASTA_DIR}/${INPUT_FASTA}
Unprocessed annotation results were filtered with a minimum score of 30 to remove low-scoring matches:zcat NPac.NPacID.kofam.tsv.gz | awk -F'\t' '{ gsub(/"/, "", $5); $5 = $5 + 0; if ($5 >= 30) print }' | gzip > NPac.NPacID.UW.bf100.id99.aa.incT30.tsv.gz
创建时间:
2025-01-22



