five

Consolidated Dataset from the RNS Study: Protein Embedding RNS Scores, Jensen–Shannon Divergence, and Sequence Alignment Matches

收藏
DataCite Commons2026-02-11 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/RNS_scores_of_protein_embeddings_along_with_the_computed_JS_divergence_and_Alignment_matches_of_their_sequences_/29080301/2
下载链接
链接失效反馈
官方服务:
资源简介:
Embeddings produced by language models (LMs) are widely used as numerical representations of natural language sentences and structured data. However, using embeddings without accounting for model confidence is a critical limitation. The <b>Random Neighbor Score (RNS)</b> provides a model- and task-agnostic measure of embedding uncertainty.<b>Associated Preprint</b>: https://www.biorxiv.org/content/10.1101/2025.04.30.651545v1<b>Files included</b>:<b>STable_consolidated_RNSscores_Rev1_v0.tsv.gz</b>: Consolidated sheet containing RNS scores, sequences, alignment results, and Jensen–Shannon divergence values.Columns labeled <code>RNS_IS_*</code> and <code>RNS_ISb_*</code> correspond to RNS values computed at different <i>k</i> settings using <b>Astral40R</b> and <b>Proteome4R</b> as random sets, respectively.<b>Astral40.fasta</b>: Sequences of selected Astral40 domains (<b>Astral40</b>).<b>Astral40_Rshuffled.fasta</b>: Sequences of "synthetic" / "random" set (<b>Astral40R</b>) - replicates AA composition of Astral40.<b>STable_Perf_esm2_t36_3B_UR50D_Astral40_Rev1_v0.tsv </b>: Embedding's RNS and contact prediction accuracy of ESM for Astral40 domains.<b>STable_Perf_esm2_t36_3B_UR50D_PDB23to24_Rev1_v0.tsv</b><b> </b>: Embedding's RNS and contact prediction accuracy of ESM for PDB23to24 structures.<b>STable_Perf_prot_t5_xl_u50_Astral40_Rev1_v0.tsv</b> : Embedding's RNS and Sec . Str. prediction accuracy of ProtT5 for Astral40 domains.<b>STable_Perf_prot_t5_xl_u50_PDB23to24_Rev1_v0.tsv</b><b> </b>: Embedding's RNS and Sec . Str. prediction accuracy of ProtT5 for PDB23to24 structures.<b>Sequence sources</b> (see manuscript for details):ASTRAL 40: https://scop.berkeley.edu/astral/Novel Meta set / Orphan set: https://10.0.23.196/m9.figshare.c.6737127Novel Hallucination set: https://www.nature.com/articles/s41586-021-04184-w#data-availability → https://files.ipd.uw.edu/pub/trRosetta/hallucinations2K.tar.gzIntrinsically Disordered Proteins (IDP) &amp; Intrinsically Disordered Regions (IDR): https://disprot.org/download (version 2024_12)<b>Related repos:</b> Sample sequence embeddings: https://doi.org/10.6084/m9.figshare.30179413.v1Code to compute RNS scores: https://bitbucket.org/bromberglab/rns/src/main/<br><br>
提供机构:
figshare
创建时间:
2025-11-26
二维码
社区交流群
二维码
科研交流群
商业服务