five

Supplementary data to the paper: Transcription factor prediction using protein 3D secondary structures

收藏
DataCite Commons2024-06-18 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_data_to_the_paper_Transcription_factor_prediction_using_protein_3D_structures/25398247
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Motivation</b>: Transcription factors (TFs) are DNA-binding proteins that regulate gene expression. Traditional methods predict a protein as a TF if the protein contains any DNA-binding domains (DBDs) of known TFs. However, this approach fails to identify a novel TF that does not contain any known DBDs. Recently proposed TF prediction methods do not rely on DBDs. Such methods use features of protein sequences to train a machine learning model, and then use the trained model to predict whether a protein is a TF or not. Because 3-dimensional (3D) structure of a protein captures more information than its sequence, using 3D protein structures will likely allow for more accurate prediction of novel TFs. <br><br>Results: We propose a deep learning-based TF prediction method (<i>StrucTFactor</i>), which is the first method to utilize 3D secondary structural information of proteins. We compare StrucTFactor with recent state-of-the-art TF prediction methods based on ∼525 000 proteins across 12 datasets, capturing different aspects of data bias (including sequence redundancy) possibly influencing a method’s performance. We find that StrucTFactor significantly (<i>p</i>-value &lt; 0.001) outperforms the existing TF prediction methods, improving the performance over its closest competitor by up to 17% based on Matthews correlation coefficient.
提供机构:
figshare
创建时间:
2024-03-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作