five

RNS scores of protein embeddings along with the computed JS divergence and Alignment matches of their sequences.

收藏
Figshare2025-11-26 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/RNS_scores_of_protein_embeddings_along_with_the_computed_JS_divergence_and_Alignment_matches_of_their_sequences_/29080301/1
下载链接
链接失效反馈
官方服务:
资源简介:
Embeddings produced by language models (LMs) are widely used as numerical representations of natural language sentences and structured data. However, using embeddings without accounting for model confidence is a critical limitation. The <b>Random Neighbor Score (RNS)</b> provides a model- and task-agnostic measure of embedding uncertainty.<b>Associated Preprint</b>: https://www.biorxiv.org/content/10.1101/2025.04.30.651545v1<b>Files included</b>:<b>File 1</b>: Consolidated sheet containing RNS scores, sequences, alignment results, and Jensen–Shannon divergence values.Columns labeled <code>RNS_IS_*</code> and <code>RNS_ISb_*</code> correspond to RNS values computed at different <i>k</i> settings using <b>Astral40R</b> and <b>Proteome4R</b> as random sets, respectively.<b>File 2</b>: FASTA file of the random sequence set (<b>Astral40R</b>).<b>Sequence sources</b> (see manuscript for details):ASTRAL 40: https://scop.berkeley.edu/astral/Novel Meta set / Orphan set: https://10.0.23.196/m9.figshare.c.6737127Novel Hallucination set: https://www.nature.com/articles/s41586-021-04184-w#data-availability → https://files.ipd.uw.edu/pub/trRosetta/hallucinations2K.tar.gzIntrinsically Disordered Proteins (IDP) &amp; Intrinsically Disordered Regions (IDR): https://disprot.org/download (version 2024_12)
提供机构:
Prabakaran, R; Bromberg, Yana
创建时间:
2025-09-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作