Data (part 6) from Illuminating the functional landscape of the dark proteome across the Animal Tree of Life through natural language processing models
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10717909
下载链接
链接失效反馈官方服务:
资源简介:
Part 6 contains:
model_organisms_semsim_go_cats.txt : semantic similarity separated by GO category between the GO annotations from GOPredSim-ProtT5 before and after removing a given model organism from the lookup dataset.
model_organisms_removed_confidence_min.txt : minimum reliability index ("confidence") per gene of GO annotations from GOPredSim-ProtT5 after removing a given model organism from the lookup dataset.
model_organisms_prott5.tar.gz : GO annotations from GOPredSim-ProtT5 after removing a given model organism from the lookup dataset.
model_organisms_orig_confidence_min.txt : minimum reliability index ("confidence") per gene of GO annotations from GOPredSim-ProtT5 before removing a given model organism from the lookup dataset.
all_isoforms_cdhit_clustr.tar.gz : CD-HIT clusters results for all isoforms for a subset of 102 species.
all_isoforms_larger_5k.tar.gz : headers of sequences that are more than 5000 aminoacids long after CD-HIT for all isoforms of a subset of 102 species.
longest_cdhit_clustr.tar.gz : CD-HIT clusters results for the longest isoforms of all species.
longest_larger_5k.tar.gz : headers of sequences that are more than 5000 aminoacids long after CD-HIT for the longest isoforms all species.
创建时间:
2024-02-28



