Random forest algorithm reveals novel sites in HA protein that shift receptor binding preference of the H9N2 avian influenza virus
收藏科学数据银行2024-12-25 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=2455ffceffa04afca6c72c51f435fd1b
下载链接
链接失效反馈官方服务:
资源简介:
After filtering out redundant, incomplete, environmental source, and unclear information sequences, a total of 5,656 H9N2 HA gene sequences were obtained from the National Center for Biotechnology Information (NCBI), Global Initiative on Sharing All Influenza Data (GISAID), and Influenza Research Database (IRD) databases. HA sequences were divided into two datasets based on host information: avian-derived sequences (5,588) and mammal-derived sequences (68) and labeled accordingly. Amino acid types were replaced with numerical values to transform the amino acid sequences into machine-readable vectors (Supplementary Table S1). We used the Random Forest Classifier method in Sklearn (1.0.22) to train the random forest classifier (Abraham et al., 2014), and in consideration of the large difference in the numbers of avian and non-avian sequences, we selected a balanced number of samples for training. The specific training parameters were: random _state=0,n _estimators=1500,oob _score=True,n _jobs=-1,class _weight='balanced'. During the model performance evaluation process, five-fold cross-validation was used to examine the classification performance of the random forest model, with the area under the ROC curve (AUC) used as the evaluation metric. Random under-sampling was applied to the avian-derived data during the training process. In the feature selection process, all data were used to train the random forest classifier and perform feature selection. After training, we extracted weight information for each site to represent its importance.
提供机构:
Xinyuan Cui; Yiting Chen; Xuejuan Shen; Yuncong Yin; Guangdong Laboratory for Lingnan Modern Agriculture, State Key Laboratory for Animal Disease Control and Prevention, Center for Emerging and Zoonotic Diseases, College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China.; Yongyi Shen; David.Irwin; Xingbang Lu; Rujian Chen
创建时间:
2024-12-23



