ghazikhanihamed/TooT-PLM-ionCT_DB
收藏数据集卡片
数据集概述
该数据集用于TooT-PLM-ionCT工具,这是一个由三个不同系统组成的复合框架,每个系统具有不同的架构并在独特的数据集上进行训练。每个系统专门用于特定任务:从其他膜蛋白中分离离子通道(ICs)和离子转运体(ITs),并区分ICs和ITs。
数据集来源
- 仓库: UniProt/SwissProt
引用
BibTeX: bibtex @misc{ghazikhani_exploiting_2023, title = {Exploiting protein language models for the precise classification of ion channels and ion transporters}, copyright = {© 2023, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0/}, url = {https://www.biorxiv.org/content/10.1101/2023.07.11.548644v1}, doi = {10.1101/2023.07.11.548644}, abstract = {This study presents TooT-PLM-ionCT, a composite framework consisting of three distinct systems, each with different architectures and trained on unique datasets. Each system within TooT-PLM-ionCT is dedicated to a specific task: segregating ion channels (ICs) and ion transporters (ITs) from other membrane proteins and differentiating ICs from ITs. These systems exploit the capabilities of six diverse Protein Language Models (PLMs) - ProtBERT, ProtBERT-BFD, ESM-1b, ESM-2 (650M parameters), and ESM-2 (15B parameters). As these proteins play a pivotal role in the regulation of ion movement across cellular membranes, they are integral to numerous biological processes and overall cellular vitality. To circumvent the costly and time-consuming nature of wet lab experiments, we harness the predictive prowess of PLMs, drawing parallels with techniques in natural language processing. Our strategy engages six classifiers, embracing both conventional methodologies and a deep learning model, for each of our defined tasks. Furthermore, we delve into critical factors influencing our tasks, including the implications of dataset balancing, the effect of frozen versus fine-tuned PLM representations, and the potential variance between half and full precision floating-point computations. Our empirical results showcase superior performance in distinguishing ITs from other membrane proteins and differentiating ICs from ITs, while the task of discriminating ICs from other membrane proteins exhibits results commensurate with the current state-of-the-art.}, language = {en}, urldate = {2023-07-31}, publisher = {bioRxiv}, author = {Ghazikhani, Hamed and Butler, Gregory}, month = jul, year = {2023}, note = {Pages: 2023.07.11.548644 Section: New Results}, file = {Full Text PDF:/Users/hamedghazikhani/Zotero/storage/NVPQKEMJ/Ghazikhani and Butler - 2023 - Exploiting protein language models for the precise.pdf:application/pdf}, }



