10-fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of ProtT5 vs Histogram-8000 protein representation methods in the inference tasks. Hist-8000 matches ProtT5 in four out of seven tasks compared (tasks 1, 3-8). Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to ProtT5 which requires SSL pre-training of a transformer T5 model with three billion parameters on a dataset of 45 million sequences [35]. Best-performing methods in bold. See section ‘Protein inference problems’ for task data sources. Hist-8000: Histogram-8000, BoW: Bag-of-Words, SoT: Sum-of-learnt-Trigrams, SSL: Self-Supervised Learning, VFs: Virulence Factors, Gram-pos: Gram-positive, Gram-neg: Gram-negative, st.dev: standard deviatio

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://figshare.com/articles/dataset/10-fold_Cross-Validation_Area-Under-the-Curve_scores_mean_st_dev_of_ProtT5_vs_Histogram-8000_protein_representation_methods_in_the_inference_tasks_Hist-8000_matches_ProtT5_in_four_out_of_seven_tasks_compared_tasks_1_3-8_Hist-8000_consists_o/29845902

下载链接

链接失效反馈

官方服务：

资源简介：

10-fold Cross-Validation Area-Under-the-Curve scores (mean ± st.dev.) of ProtT5 vs Histogram-8000 protein representation methods in the inference tasks. Hist-8000 matches ProtT5 in four out of seven tasks compared (tasks 1, 3-8). Hist-8000 consists of the conceptually simpler BoW approach [13], in contrast to ProtT5 which requires SSL pre-training of a transformer T5 model with three billion parameters on a dataset of 45 million sequences [35]. Best-performing methods in bold. See section ‘Protein inference problems’ for task data sources. Hist-8000: Histogram-8000, BoW: Bag-of-Words, SoT: Sum-of-learnt-Trigrams, SSL: Self-Supervised Learning, VFs: Virulence Factors, Gram-pos: Gram-positive, Gram-neg: Gram-negative, st.dev: standard deviatio

创建时间：

2025-08-06