five

ghazikhanihamed/TooT-PLM-ionCT_DB

收藏
Hugging Face2024-02-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ghazikhanihamed/TooT-PLM-ionCT_DB
下载链接
链接失效反馈
官方服务:
资源简介:
这些数据集被用于TooT-PLM-ionCT工具,这是一个由三个不同系统组成的复合框架,每个系统都有不同的架构并在独特的数据集上进行训练。每个系统专门用于特定的任务:从其他膜蛋白中分离离子通道(ICs)和离子转运体(ITs),并将ICs与ITs区分开来。数据集由Hamed Ghazikhani策划,数据来源包括UniProt/SwissProt。

These datasets have been used in the TooT-PLM-ionCT tool, which is a composite framework consisting of three distinct systems, each with different architectures and trained on unique datasets. Each system within TooT-PLM-ionCT is dedicated to a specific task: segregating ion channels (ICs) and ion transporters (ITs) from other membrane proteins and differentiating ICs from ITs. The dataset is curated by Hamed Ghazikhani and sourced from UniProt/SwissProt. The description mentions the use of six diverse Protein Language Models (PLMs) and the application of both conventional and deep learning classifiers. The dataset is crucial for tasks in text classification related to biological processes and cellular vitality.
提供机构:
ghazikhanihamed
原始信息汇总

数据集卡片

数据集概述

该数据集用于TooT-PLM-ionCT工具,这是一个由三个不同系统组成的复合框架,每个系统具有不同的架构并在独特的数据集上进行训练。每个系统专门用于特定任务:从其他膜蛋白中分离离子通道(ICs)和离子转运体(ITs),并区分ICs和ITs。

数据集来源

  • 仓库: UniProt/SwissProt

引用

BibTeX: bibtex @misc{ghazikhani_exploiting_2023, title = {Exploiting protein language models for the precise classification of ion channels and ion transporters}, copyright = {© 2023, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution 4.0 International), CC BY 4.0, as described at http://creativecommons.org/licenses/by/4.0/}, url = {https://www.biorxiv.org/content/10.1101/2023.07.11.548644v1}, doi = {10.1101/2023.07.11.548644}, abstract = {This study presents TooT-PLM-ionCT, a composite framework consisting of three distinct systems, each with different architectures and trained on unique datasets. Each system within TooT-PLM-ionCT is dedicated to a specific task: segregating ion channels (ICs) and ion transporters (ITs) from other membrane proteins and differentiating ICs from ITs. These systems exploit the capabilities of six diverse Protein Language Models (PLMs) - ProtBERT, ProtBERT-BFD, ESM-1b, ESM-2 (650M parameters), and ESM-2 (15B parameters). As these proteins play a pivotal role in the regulation of ion movement across cellular membranes, they are integral to numerous biological processes and overall cellular vitality. To circumvent the costly and time-consuming nature of wet lab experiments, we harness the predictive prowess of PLMs, drawing parallels with techniques in natural language processing. Our strategy engages six classifiers, embracing both conventional methodologies and a deep learning model, for each of our defined tasks. Furthermore, we delve into critical factors influencing our tasks, including the implications of dataset balancing, the effect of frozen versus fine-tuned PLM representations, and the potential variance between half and full precision floating-point computations. Our empirical results showcase superior performance in distinguishing ITs from other membrane proteins and differentiating ICs from ITs, while the task of discriminating ICs from other membrane proteins exhibits results commensurate with the current state-of-the-art.}, language = {en}, urldate = {2023-07-31}, publisher = {bioRxiv}, author = {Ghazikhani, Hamed and Butler, Gregory}, month = jul, year = {2023}, note = {Pages: 2023.07.11.548644 Section: New Results}, file = {Full Text PDF:/Users/hamedghazikhani/Zotero/storage/NVPQKEMJ/Ghazikhani and Butler - 2023 - Exploiting protein language models for the precise.pdf:application/pdf}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作