Discriminating between HuR and TTP binding sites using the k-spectrum kernel method

Figshare2017-03-24 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/Discriminating_between_HuR_and_TTP_binding_sites_using_the_i_k_i_-spectrum_kernel_method/4782388

下载链接

链接失效反馈

官方服务：

资源简介：

BackgroundThe RNA binding proteins (RBPs) human antigen R (HuR) and Tristetraprolin (TTP) are known to exhibit competitive binding but have opposing effects on the bound messenger RNA (mRNA). How cells discriminate between the two proteins is an interesting problem. Machine learning approaches, such as support vector machines (SVMs), may be useful in the identification of discriminative features. However, this method has yet to be applied to studies of RNA binding protein motifs.ResultsApplying the k-spectrum kernel to a support vector machine (SVM), we first verified the published binding sites of both HuR and TTP. Additional feature engineering highlighted the U-rich binding preference of HuR and AU-rich binding preference for TTP. Domain adaptation along with multi-task learning was used to predict the common binding sites.ConclusionThe distinction between HuR and TTP binding appears to be subtle content features. HuR prefers strongly U-rich sequences whereas TTP prefers AU-rich as with increasing A content, the sequences are more likely to be bound only by TTP. Our model is consistent with competitive binding of the two proteins, particularly at intermediate AU-balanced sequences. This suggests that fine changes in the A/U balance within a untranslated region (UTR) can alter the binding and subsequent stability of the message. Both feature engineering and domain adaptation emphasized the extent to which these proteins recognize similar general sequence features. This work suggests that the k-spectrum kernel method could be useful when studying RNA binding proteins and domain adaptation techniques such as feature augmentation could be employed particularly when examining RBPs with similar binding preferences.

研究背景核糖核酸结合蛋白（RNA-binding proteins, RBPs）人类抗原R（human antigen R, HuR）与三联体脯氨酸富含域蛋白（tristetraprolin, TTP）已被证实存在竞争性结合行为，但二者对结合的信使核糖核酸（messenger RNA, mRNA）的调控作用截然相反。细胞如何区分这两种蛋白是一个颇具研究价值的科学问题。机器学习方法（如支持向量机，support vector machines, SVMs）或可用于识别区分二者的特征，但目前该方法尚未应用于核糖核酸结合蛋白基序的相关研究中。研究结果我们将k谱核（k-spectrum kernel）应用于支持向量机（SVM），首先验证了已发表的HuR与TTP结合位点数据。通过额外的特征工程分析，明确了HuR偏好结合富含U的序列，而TTP偏好结合富含AU的序列。我们结合域自适应（domain adaptation）与多任务学习（multi-task learning）方法，对二者的共同结合位点进行预测。研究结论 HuR与TTP的结合位点差异体现为细微的序列组成特征：HuR强烈偏好富含U的序列，而TTP偏好富含AU的序列；随着序列中A碱基占比升高，序列更倾向于仅被TTP结合。我们的模型与二者竞争性结合的结论相符，尤其是在AU含量均衡的中间型序列中表现一致。这表明，非翻译区（untranslated region, UTR）内A/U占比的细微变化，即可改变结合行为及后续的mRNA分子稳定性。特征工程与域自适应分析均证实，这两种蛋白识别的序列特征存在广泛的相似性。本研究表明，k谱核方法可有效应用于核糖核酸结合蛋白的相关研究；而域自适应技术（如特征增强方法）则尤其适用于结合偏好相似的核糖核酸结合蛋白的相关研究。

创建时间：

2017-03-24

5,000+

优质数据集

54 个

任务类型

进入经典数据集