five

hf-carbon/lab-bench

收藏
Hugging Face2026-03-25 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/hf-carbon/lab-bench
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Lab-Bench (MCQ) language: - en task_categories: - question-answering - multiple-choice source_datasets: - original configs: - config_name: CloningScenarios data_files: - split: train path: CloningScenarios/train-* - config_name: CloningScenarios_cloningscenarios data_files: - split: train path: CloningScenarios_cloningscenarios/train-* - config_name: SeqQA data_files: - split: train path: SeqQA/train-* - config_name: SeqQA_Easy data_files: - split: train path: SeqQA_Easy/train-* - config_name: SeqQA_Hard data_files: - split: train path: SeqQA_Hard/train-* - config_name: SeqQA_Medium data_files: - split: train path: SeqQA_Medium/train-* - config_name: SeqQA_ORF-seq-AAid data_files: - split: train path: SeqQA_ORF-seq-AAid/train-* - config_name: SeqQA_ORF-seq-AAseq data_files: - split: train path: SeqQA_ORF-seq-AAseq/train-* - config_name: SeqQA_ORF-seq-numlen data_files: - split: train path: SeqQA_ORF-seq-numlen/train-* - config_name: SeqQA_ORF-transeff data_files: - split: train path: SeqQA_ORF-transeff/train-* - config_name: SeqQA_PCR-gene-enzprimers data_files: - split: train path: SeqQA_PCR-gene-enzprimers/train-* - config_name: SeqQA_PCR-gene-gibshindprimers data_files: - split: train path: SeqQA_PCR-gene-gibshindprimers/train-* - config_name: SeqQA_PCR-gene-gibssmaprimers data_files: - split: train path: SeqQA_PCR-gene-gibssmaprimers/train-* - config_name: SeqQA_PCR-geneprimers-enz data_files: - split: train path: SeqQA_PCR-geneprimers-enz/train-* - config_name: SeqQA_PCR-len-primers data_files: - split: train path: SeqQA_PCR-len-primers/train-* - config_name: SeqQA_PCR-primers-len data_files: - split: train path: SeqQA_PCR-primers-len/train-* - config_name: SeqQA_PCR-seq-enzprimers data_files: - split: train path: SeqQA_PCR-seq-enzprimers/train-* - config_name: SeqQA_PCR-seq-primers data_files: - split: train path: SeqQA_PCR-seq-primers/train-* - config_name: SeqQA_Prop-seq-gcpercent data_files: - split: train path: SeqQA_Prop-seq-gcpercent/train-* - config_name: SeqQA_RE-seq-lenfrags data_files: - split: train path: SeqQA_RE-seq-lenfrags/train-* - config_name: SeqQA_RE-seq-numfrags data_files: - split: train path: SeqQA_RE-seq-numfrags/train-* --- # Lab-Bench MCQ Subsets This dataset publishes selected subsets from `futurehouse/lab-bench` in a deterministic multiple-choice format aligned with `hf-carbon/gpqa-biology-mcq`. ## Included source subsets - `SeqQA` - `CloningScenarios` ## Derived SeqQA configs Per-subtask SeqQA configs: - `SeqQA_ORF-seq-AAid` - `SeqQA_ORF-seq-AAseq` - `SeqQA_ORF-seq-numlen` - `SeqQA_ORF-transeff` - `SeqQA_PCR-gene-enzprimers` - `SeqQA_PCR-gene-gibshindprimers` - `SeqQA_PCR-gene-gibssmaprimers` - `SeqQA_PCR-geneprimers-enz` - `SeqQA_PCR-len-primers` - `SeqQA_PCR-primers-len` - `SeqQA_PCR-seq-enzprimers` - `SeqQA_PCR-seq-primers` - `SeqQA_Prop-seq-gcpercent` - `SeqQA_RE-seq-lenfrags` - `SeqQA_RE-seq-numfrags` IRT percentile difficulty configs: - `SeqQA_Easy` - `SeqQA_Medium` - `SeqQA_Hard` The difficulty configs are derived from `hf-carbon/seqqa-irt-difficulty`, subset `irt_item_difficulty`, using the same percentile bucketing logic as `evaluation/scripts/plot_difficulty_irt.py`: sort SeqQA items by `difficulty_b` ascending and use `numpy.array_split(..., 3)` to assign easy, medium, and hard buckets. ## Source and transformation - Source dataset: `futurehouse/lab-bench` - Transformation script: `create_dataset.py` For each original example: - `question` is retained as-is - `ideal` becomes `answer` - `ideal + distractors` are converted into `options` - `answer_index` is the index of `answer` inside `options` Options are shuffled deterministically per example using the source `id` (MD5-seeded RNG), so conversions are reproducible. Original metadata columns are retained (for example `id`, `canary`, `source`, `subtask`). ## Schema - `question: string` - `options: list[string]` - `answer: string` - `answer_index: int64` - `id: string` - `canary: string` - `source: null` - `subtask: string` ## Usage ```py from datasets import load_dataset seqqa = load_dataset("hf-carbon/lab-bench", "SeqQA", split="train") seqqa_hard = load_dataset("hf-carbon/lab-bench", "SeqQA_Hard", split="train") ```
提供机构:
hf-carbon
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作