Benchmarking Machine Learning Methods for Synthetic Lethality Prediction in Cancer

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13691647

下载链接

链接失效反馈

官方服务：

资源简介：

Benchmarking of Machine Learning Methods for Predicting Synthetic Lethality Interactions Update Notice In the latest version of the dataset, a new file named `predicted_by_model.csv` has been added. This file contains a list of top-ranked gene pairs based on model predictions. It may provide valuable insights for identifying novel synthetic lethality interactions. Data Preparation and Download Instructions After downloading data from this project, follow these steps to prepare the training data: Step 1: Download all the data parts from this repository. Matters needing attention: The actual command will depend on how you're downloading files from Google Drive. For the convenience of downloading, the files have been compressed and split. We have prepared two versions of data, namely the complete version (`data_large.tar.gz`) and the version without PiLSL database (`data_small.tar.gz`). The decompressed size of the complete version is about 90GB, while the decompressed size of the version without PiLSL database is about 22GB. (Due to the extremely time-consuming database construction process, it is recommended to download the complete version of the data. However, the data version without PiLSL database can successfully run all 10 models except PiLSL.) Step 2: Combine the parts into a single archive. cat data_large.tar.gz.part* > data_large.tar.gz # Complete version, the size after extracted is about 100GB. # cat data_small.tar.gz.part* > data_small.tar.gz # The version without PiLSL database, the size after extracted is about 25GB. Step 3: Verify the integrity of the downloaded files. md5sum -c data_large.tar.gz.md5 # md5sum -c data_small.tar.gz.md5 # The version without PiLSL database Step 4: Extract the dataset. tar -xzvf data_large.tar.gz # tar -xzvf data_small.tar.gz # The version without PiLSL database

创建时间：

2024-11-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集