AlphaFind v2: Evaluation data, results and reproducibility protocol
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/AlphaFind_v2_Evaluation_data_results_and_reproducibility_protocol/31802743
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the evalulation dataset, code and results reported on in the publication: AlphaFind v2: Similarity Search in AlphaFold DB and TED Domains across Structural Contexts (https://doi.org/10.64898/2026.03.10.710735).
The evaluation uses the multi-domain protein selection from https://doi.org/10.6084/m9.figshare.30546650 (afdb-benchmark/af-cath-multi-domain-list.tsv) and downloads this data from AlphaFold DB (https://alphafold.ebi.ac.uk/) and TED DB (https://ted.cathdb.info/) API services. The dataset is then saved as afdb-structures/ (2050 multidomain protein chains from AlphaFold DB) and afdb-structures-domains/ (4420 TED domains extracted from the 2050 multidomain proteins).
Contentsalphafind-evaluation-data.zip
├── afdb-benchmark/├── afdb-structures/└── afdb-structures-domains/results.zip
├── foldseek_results/ # Raw FoldSeek API responses│ └── afdb50_AF-{UNIPROT_ID}-F1-model_v6.json│├── foldseek_results_tmscores/ # FoldSeek results with TM-scores│ └── foldseek_results_tmscores_{UNIPROT_ID}.csv│├── alphafindv1_results/ # AlphaFind v1 search results│ └── {UNIPROT_ID}_chainA_limit{K}.json│├── alphafindv2_results/ # AlphaFind v2 chain search results│ └── {UNIPROT_ID}_chains_k{K}.json│├── alphafindv2_domains_results/ # AlphaFind v2 domain search results│ └── {UNIPROT_ID}_TED{NN}_chains_k{K}.json│├── merizo_results/ # Merizo domain search results│ ├── AF-{UNIPROT_ID}-F1-model_v4_TED{NN}_results.json│ └── AF-{UNIPROT_ID}-F1-model_v4_TED{NN}_search.tsv│├── figures/ # Comparison plots│ ├── chains_comparison_tm.pdf # Chains TM-score boxplot│ ├── chains_comparison_tm.png│ ├── domains_comparison_tm.pdf # Domains TM-score boxplot│ └── domains_comparison_tm.png│├── *_with_timing.csv # Timing data for each method├── foldseek-nresults.csv # Result counts per query└── *_downloads.csv # Download logsstatistical_tests.zip
├── aggregate_results_chains_statistics.csv├── aggregate_results_chains_with_stats.py├── aggregate_results_domains_statistics.csv└── statistical_tests.mdalphafind-v2-evaluation-scripts.zip
├── README.md├── aggregate_results.py├── aggregate_results_chains_with_stats.py├── compute-tms.py├── count-foldseek-results.py├── download_data.py├── eval-alphafindv1.py├── eval-alphafindv2.py├── eval-alphafindv2_domains.py├── eval-foldseek.py├── eval-merizo.py├── extract-foldseek.py├── find_domain_outliers.py├── plot_domains_comparison.py├── plot_input_statistics.py├── plot_results_comparison.py├── requirements.txt└── visualize_results.py figures.zip
├── chains_comparison.pdf ├── chains_comparison_time.pdf ├── chains_comparison_tm.pdf ├── domains_comparison_time.pdf └── domains_comparison_tm.pdf ├── cath_domains_per_chain.pdf ├── cath_unique_families_per_chain.pdf ├── chain_atoms_histogram.pdf ├── chain_residues_histogram.pdf ├── domain_atoms_histogram.pdf └── domain_residues_histogram.pdfHow to reproduceThe instructions are also in the README.md of alphafind-v2-evaluation-scripts.zip
Prerequisites
Python (Originally run on Python 3.10.16)USalign - for TM-score computation, make for USalign compilationgit clone https://github.com/pylelab/USalign.git
cd USalign && make
Python dependenciespip install numpy pandas scipy requests tqdm matplotlib
Download the data
python download_data.py
This downloads:
Protein chain PDB files to afdb-structures/Domain PDB files to afdb-structures-domains/Alternatively, you can use the included alphafind-evaluation-data.zip and just move the subdirectories into the main directory structure: cd alphafind-evaluation-data/ && mv * ../.
Run FoldSeek Search
Run FoldSeek Server API search first (required to determine result counts for other methods):
python eval-foldseek.py
Searches against `afdb-50` databaseResults saved to `results/foldseek_results/`Timing saved to `results/foldseek_results_with_timing.csv`Prepare FoldSeek results for TM-Score computation
python extract-foldseek.py
Extracts results from foldseek evaluation to individual CSV files in results/foldseek_results_tmscores/
Compute TM-Scores
python compute-tms.py --input-dir results/foldseek_results_tmscores
Uses USalign to compute TM-scores for FoldSeek results. The timing is not included in search time.
Count the FoldSeek results
python count-foldseek-results.py
Creates foldseek-nresults.csv used to match result counts in AlphaFind queries.
Run AlphaFind v1 Search
python eval-alphafindv1.py
Queries specify UniProt ID, chain (A), and limit matching FoldSeek result countsResults saved to `results/alphafindv1_results/`TM-scores returned directly by APIRun AlphaFind v2 Search
For chains: python eval-alphafindv2.py
For domains:python eval-alphafindv2_domains.py
Queries use `k` parameter matching baseline result countsTwo timing metrics recorded:- Approximate time: until initial results collected- TM-score time: until exact TM-scores computedResults saved to results/alphafindv2_results/ or results/alphafindv2_domains_results/Run Merizo Search (Domains)
python eval-merizo.py
Searches against TED databaseResults saved to `results/merizo_results/`TM-scores returned directly (columns: `q_tm`, `t_tm`, `max_tm`)Aggregate Results with Statistical Testing
This step produces final summary tables and p-values.
cd results/ && python ../aggregate_results_chains_with_stats.py
Outputs:
aggregate_results_chains.csv - Chain performance summaryaggregate_results_chains_statistics.csv - Chain statistical testsaggregate_results_domains.csv - Domain performance summaryaggregate_results_domains_statistics.csv - Domain statistical testsGenerate Visualization Plots
python plot_input_statistics.py
From results/ directory:
cd results/
python ../plot_results_comparison.py # Chains boxplot
python ../plot_domains_comparison.py # Domains boxplot
Outputs:
input_statistics/*.pdf - Input data histogramsresults/figures/chains_comparison_tm.pdf - Chains TM-score comparisonresults/figures/domains_comparison_tm.pdf - Domains TM-score comparison
创建时间:
2026-03-25



