five

zarnite/zarn-workspace-rag-qa

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zarnite/zarn-workspace-rag-qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 annotations_creators: - expert-generated - machine-generated language_creators: - expert-generated - machine-generated source_datasets: - original task_categories: - question-answering - text-generation tags: - zarnite - benchmark - rag - grounded-qa - retrieval - gold-track - benchmark-starter pretty_name: Zarn Workspace RAG QA size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: train path: data/train.jsonl - split: validation path: data/validation.jsonl - split: test path: data/test.jsonl --- # Zarn Workspace RAG QA ## Dataset Description Small document bundles paired with grounded answers, evidence, and explicit refusals when context is missing. ## Team Attribution This dataset was created and reviewed by the Zarnite team through internal benchmark design, generation, and quality-control workflows. It should be presented as a Zarnite-authored benchmark starter pack, not as a purely human-collected field corpus. ## Ecosystem Need Tier High Ecosystem Need ## Why This Category Is Attractive RAG systems fail most often on groundedness and abstention, so richer retrieval benchmarks with traps and known gaps are highly useful. ## Benchmark Goal Evaluate grounded answering, citation precision, answerability judgment, and refusal quality under partial workspace context. ## Included In This Folder - `data/train.jsonl`, `data/validation.jsonl`, `data/test.jsonl`: starter benchmark splits with 1200 total rows. - `schema.json`: JSON Schema for row validation. - `benchmark_spec.json`: metrics, quality gates, and target release scale. - `LICENSE.md`: folder-local license notice for self-contained publishing. - `PUBLISHING.md`: repo-specific publish instructions for Hugging Face. - `hf_repo_template.json`: machine-readable repo template used by the uploader script. ## Target Public Scale - Train: 24,000 - Validation: 3,000 - Test: 3,000 - Total target rows: 30,000 ## Recommended Metrics - `answer_and_citation_f1` - `answerability_accuracy` - `unsupported_claim_rate` - `citation_precision` - `groundedness` ## Gold-Track Benchmark Assets - `ANNOTATION_GUIDELINES.md`: how to expand rows without drifting from the benchmark purpose. - `REVIEW_PROTOCOL.md`: how to audit validation and test rows with dual review and adjudication. - `BASELINE_EVAL_SPEC.json`: expected output contract, slice reporting, and release thresholds. - `RELEASE_CHECKLIST.md`: final pre-publish checks for the public Hugging Face release. - `SCORING_PROFILE.json`: prediction keys, scoring expectations, and slice reporting requirements. - `prediction_template.jsonl`: starter template for benchmark submissions or baseline runs. ## Expanded Row Anatomy - `knowledge_bundle`: multiple documents with sections, partial authority, and known gaps. - `query_context`: who is asking and what type of retrieval task this is. - `unsupported_claim_traps`: tempting details the model must not invent. - `answerability`: whether the question should be answered or refused. - `difficulty_rationale`: why the row belongs in its difficulty bucket instead of a weaker slice. - `benchmark_slices`: named reporting slices such as approval friction, proof preservation, or citation traps. - `adversarial_features`, `expected_failure_modes`, and `review_readiness`: what the row is testing and how a gold-track reviewer should treat it. - `evidence_manifest`, `reference_variants`, and `negative_examples`: the source evidence boundary, acceptable alternate answers, and concrete failure cases. ## Hugging Face Deployment This folder is self-contained and can be uploaded as its own Hugging Face dataset repository. - Suggested repo id: `zarnite/zarn-workspace-rag-qa` - Example upload command: `python upload_to_huggingface.py --dataset-folder "push/high-ecosystem-need/Zarn-Workspace-RAG-QA" --repo-id "zarnite/zarn-workspace-rag-qa"` - You can swap the namespace by passing `--namespace YOUR_USERNAME` to the uploader. ## Local Evaluation - Example eval command: `python run_priority_eval.py --dataset-folder "push/high-ecosystem-need/Zarn-Workspace-RAG-QA" --splits validation test` - `prediction_template.jsonl` gives the required output shape for local or leaderboard-style submissions. ## License This package is marked `apache-2.0`. The rows in this folder are original starter examples for benchmark packaging.
提供机构:
zarnite
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作