(Datasets) Systematic representation and optimization enable the inverse design of cross-species regulatory sequences in bacteria
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14598566
下载链接
链接失效反馈官方服务:
资源简介:
This repository supplies the essential datasets for pretraining and fine-tuning as involved in the research paper.
”bacteria_all_no_unknown2kspe_180spe_norepeat_classlabel_train995.txt“ represents the "pretrain_seqs.txt" in `aae_meta.py`, while "BS_EC_PA_JY_train_comparative_genome_paper_storzdRNAseq.txt"represents the finetune_file, implemented in `aae_genus.py` and `aae_meta.py`.
All other dataset for evaluation, including supervised training data are directly uploaded to github repository (https://github.com/WangLabTHU/DeepCROSS/tree/main).
本仓库提供了本研究论文中涉及预训练与微调所需的核心数据集。
文件`bacteria_all_no_unknown2kspe_180spe_norepeat_classlabel_train995.txt`对应`aae_meta.py`中的`pretrain_seqs.txt`;而文件`BS_EC_PA_JY_train_comparative_genome_paper_storzdRNAseq.txt`则对应微调文件,该文件在`aae_genus.py`与`aae_meta.py`中实现。
其余所有用于评估的数据集(包含监督训练数据)均已直接上传至GitHub仓库:https://github.com/WangLabTHU/DeepCROSS/tree/main。
创建时间:
2025-01-05



