five

cspeters119/ops-eval

收藏
Hugging Face2026-05-07 更新2026-05-31 收录
下载链接:
https://hf-mirror.com/datasets/cspeters119/ops-eval
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-classification - feature-extraction tags: - biology - microscopy - CRISPR - evaluation - benchmark size_categories: - 1M<n<10M --- # OPS-Eval: Leakage-Resistant Evaluation for Optical Pooled Screens Benchmark artifacts for evaluating representation learning on pooled CRISPR microscopy data. This dataset accompanies a submission to the NeurIPS 2026 Evaluations and Datasets Track. ## Contents | Directory/File | Description | Size | |----------------|-------------|------| | `montages/` | Per-gene montage images (4 channels x 2 phases, ~10 PNGs per gene) | ~66 GB | | `cell_embeddings/` | Pre-extracted 512-dim cell embeddings per sgRNA (.npz) | ~15 GB | | `splits_v1.json` | Gene-disjoint train/val/test split (3,628 / 694 / 1,000 genes) | 1.4 MB | | `splits_v1_random.json` | Random image-level split (comparison, demonstrates leakage) | 1.4 MB | | `splits_v1_sgrna_disjoint.json` | Guide-disjoint split (comparison) | 1.3 MB | | `gene_metadata.parquet` | Per-gene metadata: cluster labels, mitotic index, guide count, 98-dim features | 1.9 MB | | `sgrna_metadata.parquet` | Per-sgRNA metadata | 48 KB | | `leakage_audit_v1.json` | Formal verification of zero gene overlap across splits | 99 KB | | `manifest_v1.json` | SHA-256 checksums for all montage images | 4 KB | | `external_gene_pairs_v1.json` | STRING + CORUM gene relationships for co-functional retrieval | 14 MB | | `results/` | All experiment results (baseline ladder, ablations, split sensitivity) | 2.4 MB | | `idr0071/` | External validation on independent idr0071 dataset (A549 cells) | 7.4 MB | | `cellpaint_posh/` | Cell Painting POSH replication: features (`.pq`) and sgRNA library (`.csv`) | 1.5 GB | ## Montage Image Structure Each gene directory contains ~10 PNG files following the naming convention: ``` {GENE}.{phase}-montage-{channel}.png ``` - **Phases:** `interphase`, `mitotic` - **Channels:** 0=DNA/DAPI, 1=Tubulin, 2=gH2AX, 3=Actin, 4=Label (segmentation mask) - **Dimensions:** ~2700x2000 px per montage - **Band structure:** 5 horizontal bands per montage (sgRNA 1-4 + non-targeting control), each containing ~100 tiled single-cell crops at ~100x100 px Total: 5,322 genes, ~53,000 montage images, ~2.1M single cells. ## Cell Embeddings Pre-extracted 512-dimensional cell-level embeddings (float16) from a frozen ResNet-18 encoder. One `.npz` file per gene, with keys corresponding to individual sgRNAs. ## Quick Start ```python from huggingface_hub import hf_hub_download, snapshot_download import json # Download just the splits path = hf_hub_download("cspeters119/ops-eval", "splits_v1.json", repo_type="dataset") splits = json.load(open(path)) print(f"Loaded splits for {len(splits)} genes") # Download a single gene's montages snapshot_download("cspeters119/ops-eval", repo_type="dataset", allow_patterns="montages/AAAS/*") # Download everything (warning: ~85 GB) snapshot_download("cspeters119/ops-eval", repo_type="dataset") ``` ## Code See the supplementary material for evaluation code, baseline implementations, and one-command reproduction harness (`bash scripts/run_all.sh --tiny`). ## License MIT
提供机构:
cspeters119
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作