RNS scores of protein embeddings along with the computed JS divergence and Alignment matches of their sequences.
收藏Figshare2025-11-26 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/RNS_scores_of_protein_embeddings_along_with_the_computed_JS_divergence_and_Alignment_matches_of_their_sequences_/29080301/1
下载链接
链接失效反馈官方服务:
资源简介:
Embeddings produced by language models (LMs) are widely used as numerical representations of natural language sentences and structured data. However, using embeddings without accounting for model confidence is a critical limitation. The <b>Random Neighbor Score (RNS)</b> provides a model- and task-agnostic measure of embedding uncertainty.<b>Associated Preprint</b>: https://www.biorxiv.org/content/10.1101/2025.04.30.651545v1<b>Files included</b>:<b>File 1</b>: Consolidated sheet containing RNS scores, sequences, alignment results, and Jensen–Shannon divergence values.Columns labeled <code>RNS_IS_*</code> and <code>RNS_ISb_*</code> correspond to RNS values computed at different <i>k</i> settings using <b>Astral40R</b> and <b>Proteome4R</b> as random sets, respectively.<b>File 2</b>: FASTA file of the random sequence set (<b>Astral40R</b>).<b>Sequence sources</b> (see manuscript for details):ASTRAL 40: https://scop.berkeley.edu/astral/Novel Meta set / Orphan set: https://10.0.23.196/m9.figshare.c.6737127Novel Hallucination set: https://www.nature.com/articles/s41586-021-04184-w#data-availability → https://files.ipd.uw.edu/pub/trRosetta/hallucinations2K.tar.gzIntrinsically Disordered Proteins (IDP) & Intrinsically Disordered Regions (IDR): https://disprot.org/download (version 2024_12)
提供机构:
Prabakaran, R; Bromberg, Yana
创建时间:
2025-09-22



