five

Data Sets and Results for "Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins"

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5137961
下载链接
链接失效反馈
官方服务:
资源简介:
Data sets and results for "Improved data sets and evaluation methods for the automatic prediction of DNA-binding proteins" The file "dna_binding_protein_sequences.zip" has the training and testing sets from the paper: RLL - "random__full_1000.csv" RSL - "random__40.csv" RS&LL - "random__40_1000.csv" RLL where included positive examples have verified DNA binding activity - "random__hq_1000.csv" The 10 RS&LL data sets - "random__40_1000.csv" + "random__40_1000_cv_<0-8>.csv" The results files are named similarly. See "see_results.ipynb" in the codebase that supplement these data sets The species data sets are derived from "uniprot_data_bac.tab" and "uniprot_data_not_bac.tab."  See code.  The ESM embeddings used by the XGBoost model are in "dna_binding_protein_esm.zip"
创建时间:
2021-08-03
二维码
社区交流群
二维码
科研交流群
商业服务