five

Dataset for the article "A descriptor-free machine learning framework to improve antigen discovery for bacterial pathogens"

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15065206
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset for the article "A descriptor-free machine learning framework to improve antigen discovery for bacterial pathogens" The data used in the study are provided as a .zip file Once unzipped, the raw data in FASTA format will be located in the following folders: data/fasta├── benchmark│   ├── test│   │   └── test.fasta      # test sequences for benchmark│   └── training│       ├── negative.fasta  # non-PA sequences for benchmark│       └── positive.fasta  # PA-sequences for benchmark└── lobo    ├── negative.fasta      # non-PA sequences for LOBO evaluation    └── positive.fasta      # PA sequences for LOBO evaluation The annotated sequences used in the experiments (and the corresponding PSE/descriptors) are stored in .parquet format. They can be loaded with python as follows (please note that parquet files need the pandas and pyarrow libraries installed to be loaded): import pandas as pd example_path = "data/descriptors/lobo/test/UP000000425.parquet"df = pd.read_parquet(example_path) Here is the annotated sequences folder structure: data├── descriptors│   └── lobo│       ├── test  # Descriptors for candidate antigen selection│       │   ├── UP000000425.parquet  # N. meningitidis│       │   ├── UP000000586.parquet  # S. pneumoniae│       │   ├── UP000000625.parquet  # E. coli│       │   ├── UP000000750.parquet  # S. pyogenes│       │   ├── UP000000799.parquet  # C. jejuni│       │   ├── UP000000800.parquet  # C. muridarum│       │   ├── UP000001432.parquet  # A. pleuropneumoniae│       │   ├── UP000001584.parquet  # M. tuberculosis│       │   ├── UP000006386.parquet  # S. aureus│       │   └── UP000326807.parquet  # Y. pestis│       └── training│           ├── negative.parquet     # non-PA descriptors for LOBO│           └── positive.parquet     # PA descriptors for LOBO└── pses    ├── benchmark    │   ├── test    │   │   └── test.parquet         # test PSEs for benchmark    │   └── training    │       ├── negative.parquet     # non-PA PSEs for benchmark    │       └── positive.parquet     # PA PSEs for benchmark    └── lobo        ├── test  # PSEs for candidate antigen selection        │   ├── UP000000425.parquet  # N. meningitidis        │   ├── UP000000586.parquet  # S. pneumoniae        │   ├── UP000000625.parquet  # E. coli        │   ├── UP000000750.parquet  # S. pyogenes        │   ├── UP000000799.parquet  # C. jejuni        │   ├── UP000000800.parquet  # C. muridarum        │   ├── UP000001432.parquet  # A. pleuropneumoniae        │   ├── UP000001584.parquet  # M. tuberculosis        │   ├── UP000006386.parquet  # S. aureus        │   └── UP000326807.parquet  # Y. pestis        └── training            ├── negative.parquet     # non-PA PSEs for LOBO            └── positive.parquet     # non-PA PSEs for LOBO
创建时间:
2025-03-21
二维码
社区交流群
二维码
科研交流群
商业服务