Benchmarking Machine Learning Methods for Synthetic Lethality Prediction in Cancer
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13691647
下载链接
链接失效反馈官方服务:
资源简介:
Benchmarking of Machine Learning Methods for Predicting Synthetic Lethality Interactions
Update Notice
In the latest version of the dataset, a new file named `predicted_by_model.csv` has been added. This file contains a list of top-ranked gene pairs based on model predictions. It may provide valuable insights for identifying novel synthetic lethality interactions.
Data Preparation and Download Instructions
After downloading data from this project, follow these steps to prepare the training data:
Step 1: Download all the data parts from this repository.
Matters needing attention:
The actual command will depend on how you're downloading files from Google Drive. For the convenience of downloading, the files have been compressed and split.
We have prepared two versions of data, namely the complete version (`data_large.tar.gz`) and the version without PiLSL database (`data_small.tar.gz`).
The decompressed size of the complete version is about 90GB, while the decompressed size of the version without PiLSL database is about 22GB.
(Due to the extremely time-consuming database construction process, it is recommended to download the complete version of the data. However, the data version without PiLSL database can successfully run all 10 models except PiLSL.)
Step 2: Combine the parts into a single archive.
cat data_large.tar.gz.part* > data_large.tar.gz # Complete version, the size after extracted is about 100GB.
# cat data_small.tar.gz.part* > data_small.tar.gz # The version without PiLSL database, the size after extracted is about 25GB.
Step 3: Verify the integrity of the downloaded files.
md5sum -c data_large.tar.gz.md5
# md5sum -c data_small.tar.gz.md5 # The version without PiLSL database
Step 4: Extract the dataset.
tar -xzvf data_large.tar.gz
# tar -xzvf data_small.tar.gz # The version without PiLSL database
创建时间:
2024-11-02



