Benchmark dataset for CATH hierarchical clustering tools (GeMMA/FunFHMMEr, MARC, FRAN and eMMA)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/8425747
下载链接
链接失效反馈官方服务:
资源简介:
Benchmark dataset for CATH SuperFamily 3.40.50.620 (HUPS).
Contains Functional Families alignments and Hidden Markov Models generated by GeMMA/FunFHMMER, MARC, FRAN and CATH-eMMA and Python code used to assess their quality (EC purity, DOPS, Neff) and intermediate steps by the MARC and FRAN pipelines (pooling, randomisation, renaming).
3.4.50.620_full_superfamily_sequences.fasta contains all HUPs superfamily sequences, the FunFams are a subset of these.
all_starting_clusters_sequences.fasta contain the sequences included in the starting clusters used in the analyses.
3.40.50.620_embedded.pt includes embeddings for the HUPs superfamily generated using the ESM2 Protein Language Model.
创建时间:
2024-06-06



