five

Quantifying the hardness of bioactivity prediction tasks for transfer learning

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10605092
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains the following information about FS-Mol dataset: Embedding of all the molecules in each task with different featurization methods External chemical distance between train and test tasks caculated with optimal transport dataset distance (OTDD) method External protein distance between train and test tasks calculated from ESM-2 respresentation of the proteins Internal chemical hardness (which is a random forest for all the train and test tasks) Prototypical network performance on the test tasks Random forest performance on the test tasks   Paper Abstract: Today, machine learning methods are widely employed in drug discovery. However, the chronic lack of data continues to hamper their further development, validation, and application. Several modern strategies aim to mitigate the challenges associated with data scarcity by learning from data on related tasks. These knowledge-sharing approaches encompass transfer learning, multi-task learning, and meta-learning. A key question remaining to be answered for these approaches is about the extent to which their performance can benefit from the relatedness of available source (training) tasks, in other words, how difficult (“hard”) a test task is to a model, given the available source tasks. This study introduces a new method for quantifying and predicting the hardness of a bioactivity prediction task based on its relation to the available training tasks. The approach involves the generation of protein and chemical representations and the calculation of distances between the bioactivity prediction task and the available training tasks. In the example of meta-learning, we demonstrate that the proposed task hardness metric is inversely correlated with performance. The metric will be useful in estimating the task-specific gain in performance that can be achieved through meta-learning.
创建时间:
2024-02-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作