feature-representation-for-LLMs
收藏DataCite Commons2024-03-28 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/feature-representation-for-LLMs/24312292/5
下载链接
链接失效反馈官方服务:
资源简介:
This is a database for feature representation of ESM2, which includes Swiss data, Swiss normalized data, original TrEMBL data, original TrEMBL normalized data, non-homology TrEMBL data and Table S10.Non-homologous TrEMBL normalized data can be created by extracting Entry ID from the non-homologous TrEMBL data and then extracting the corresponding feature representation from the original TrEMBL normalized data.Figure S4 (eos) and Figure S5 (eos) are supplement for the Histogram plots and Scatter plots of feature eos in corresponding Figure S4 and Figure S5.Figure S6 and Figure S8 are the results of GO annotation enrichment. The GO gene set is a grouped protein dataset used for GO annotation enrichment.Figure S7 is a silhouette score plot.For specific usage of the dataset, please refer to Github.The RF_model files are pickle files for different RF+RF_filter models, which can be used for dataset inference and interpretable analysis. Among these models, the AA_count model and feature_all model have more complex feature inputs. Therefore, we provide the Swiss training dataset as a reference for feature arrangement. The feature order for other models is simply from 0 to 1279.
提供机构:
figshare
创建时间:
2024-01-31



