Datasets and Jupyter notebook for the structural analysis of protein-RNA interface evolution
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11126925
下载链接
链接失效反馈官方服务:
资源简介:
The present repository contains data and code related to our manuscript "Structural comparison of protein-RNA homologous interfaces reveals widespread overall conservation contrasted with versatility in polar contacts". In the manuscript, we analyze the evolution of protein-RNA interfaces by building a dataset of protein-RNA interologs (homologous interfaces) and exploring how interface contacts are conserved between homologous interfaces, as well as possible explanations for non-conserved contacts.
This repository contains the following files:
DataAnalysisNotebook.ipynb is a Jupyter notebook to reproduce contact conservation analysis and all figures from our manuscript, and to explore data
env.yaml is an environment file in order to build a Conda/Mamba environment to run the Jupyter notebook
2022-02-21-PDB.csv contains data from the PDB about 3D structures of complexes containing interacting protein and RNA chains (PDB structure identifier, chain identifiers, experimental technique and resolution)
2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.tsv contains more detailed information about interacting protein and RNA chains from these complexes (PDB and chain identifiers, protein and RNA size, interface size and number of contacts)
2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.txt.selectXE_2.50_p30_r10_pi5_ri5_rep_bc-100.out_RNAcl_0.99.tsv contains the same detailed information, restricted to the filtered dataset used as a starting point in our interolog search pipeline
PDBinterfaceAlign.csv contains information about the structural alignment of pairs of protein-RNA interactions (structural alignment TM-scores, sequence identity and coverage)
DataInterologsParam.tsv contains information about a pre-filtered set of 2587 potential interologs (including interface RMSD, sequence identity and coverage and interface size)
DataInterologsContactsFixedSASA.tsv contains detailed information about conserved and non-conserved contacts in the final set of 2022 interologs (atomic contacts, apolar contacts, hydrogen bonds, salt bridges and stacking information for aminoacid-nucleotide pairs, as well as information about whether each belongs to the interface, secondary structures, and the aminoacid surface accessibility and evolutionary conservation metrics) - compared to version 1, the calculation of solvent accessibility was fixed for a number of interolog pairs
DataCons.csv contains precomputed contact conservation metrics for each of the 2022 interolog pairs, for fast reproduction of manuscript figures
DataInterologsContactsResampledMaintainStructSeqId.tsv, DataInterologsContactsShuffled.tsv and DataInterologsShuffled.tsv relate to baselines computed for contact conservation assessment
clan.txt, clan_membership.txt, ecod.latest.domains.uniq.txt, rfam_interfaces_977.txt, DataGroupsECOD.tsv, DataGroupesRFAM.tsv, DataGroupsRFAMClan.tsv, DataInterfaceGroupsECOD.tsv and DataInterfaceGroupsRFAM.tsv relate to the ECOD (respectively Rfam) classification of protein domains (respectively RNA) in protein-RNA interfaces from our dataset
ListeIntraHbonds.pkl and ListeIntraSaltBridges.pkl are pickle-format data files containing intra-molecular hydrogen bonds and salt bridges (respectively) that are used to analyse scenarii of compensation for non-conserved polar contacts.
创建时间:
2024-09-08



