five

DataSheet_1_Comparative Study of Repertoire Classification Methods Reveals Data Efficiency of k-mer Feature Extraction.pdf

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/DataSheet_1_Comparative_Study_of_Repertoire_Classification_Methods_Reveals_Data_Efficiency_of_k-mer_Feature_Extraction_pdf/20341902
下载链接
链接失效反馈
官方服务:
资源简介:
The repertoire of T cell receptors encodes various types of immunological information. Machine learning is indispensable for decoding such information from repertoire datasets measured by next-generation sequencing (NGS). In particular, the classification of repertoires is the most basic task, which is relevant for a variety of scientific and clinical problems. Supported by the recent appearance of large datasets, efficient but data-expensive methods have been proposed. However, it is unclear whether they can work efficiently when the available sample size is severely restricted as in practical situations. In this study, we demonstrate that their performances can be impaired substantially below critical sample sizes. To complement this drawback, we propose MotifBoost, which exploits the information of short k-mer motifs of TCRs. MotifBoost can perform the classification as efficiently as a deep learning method on large datasets while providing more stable and reliable results on small datasets. We tested MotifBoost on the four small datasets which consist of various conditions such as Cytomegalovirus (CMV), HIV, α-chain, β-chain and it consistently preserved the stability. We also clarify that the robustness of MotifBoost can be attributed to the efficiency of k-mer motifs as representation features of repertoires. Finally, by comparing the predictions of these methods, we show that the whole sequence identity and sequence motifs encode partially different information and that a combination of such complementary information is necessary for further development of repertoire analysis.
创建时间:
2022-07-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作