five

Datasets and Jupyter notebook for the structural analysis of protein-RNA interface evolution

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11126925
下载链接
链接失效反馈
官方服务:
资源简介:
The present repository contains data and code related to our manuscript "Structural comparison of protein-RNA homologous interfaces reveals widespread overall conservation contrasted with versatility in polar contacts". In the manuscript, we analyze the evolution of protein-RNA interfaces by building a dataset of protein-RNA interologs (homologous interfaces) and exploring how interface contacts are conserved between homologous interfaces, as well as possible explanations for non-conserved contacts. This repository contains the following files: DataAnalysisNotebook.ipynb is a Jupyter notebook to reproduce contact conservation analysis and all figures from our manuscript, and to explore data env.yaml is an environment file in order to build a Conda/Mamba environment to run the Jupyter notebook  2022-02-21-PDB.csv contains data from the PDB about 3D structures of complexes containing interacting protein and RNA chains (PDB structure identifier, chain identifiers, experimental technique and resolution) 2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.tsv contains more detailed information about interacting protein and RNA chains from these complexes (PDB and chain identifiers, protein and RNA size, interface size and number of contacts) 2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.txt.selectXE_2.50_p30_r10_pi5_ri5_rep_bc-100.out_RNAcl_0.99.tsv contains the same detailed information, restricted to the filtered dataset used as a starting point in our interolog search pipeline PDBinterfaceAlign.csv contains information about the structural alignment of pairs of protein-RNA interactions (structural alignment TM-scores, sequence identity and coverage) DataInterologsParam.tsv contains information about a pre-filtered set of 2587 potential interologs (including interface RMSD, sequence identity and coverage and interface size) DataInterologsContactsFixedSASA.tsv contains detailed information about conserved and non-conserved contacts in the final set of 2022 interologs (atomic contacts, apolar contacts, hydrogen bonds, salt bridges and stacking information for aminoacid-nucleotide pairs, as well as information about whether each belongs to the interface, secondary structures, and the aminoacid surface accessibility and evolutionary conservation metrics) - compared to version 1, the calculation of solvent accessibility was fixed for a number of interolog pairs DataCons.csv contains precomputed contact conservation metrics for each of the 2022 interolog pairs, for fast reproduction of manuscript figures DataInterologsContactsResampledMaintainStructSeqId.tsv, DataInterologsContactsShuffled.tsv and DataInterologsShuffled.tsv relate to baselines computed for contact conservation assessment clan.txt, clan_membership.txt, ecod.latest.domains.uniq.txt, rfam_interfaces_977.txt, DataGroupsECOD.tsv, DataGroupesRFAM.tsv, DataGroupsRFAMClan.tsv, DataInterfaceGroupsECOD.tsv and DataInterfaceGroupsRFAM.tsv relate to the ECOD (respectively Rfam) classification of protein domains (respectively RNA) in protein-RNA interfaces from our dataset ListeIntraHbonds.pkl and ListeIntraSaltBridges.pkl are pickle-format data files containing intra-molecular hydrogen bonds and salt bridges (respectively) that are used to analyse scenarii of compensation for non-conserved polar contacts.
创建时间:
2024-09-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作