five

zarnite/zarn-workflow-automation-instruct

收藏
Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zarnite/zarn-workflow-automation-instruct
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 annotations_creators: - expert-generated - machine-generated language_creators: - expert-generated - machine-generated source_datasets: - original task_categories: - text-generation - question-answering tags: - zarnite - benchmark - workflow-automation - agents - productivity - gold-track - benchmark-starter pretty_name: Zarn Workflow Automation Instruct size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: train path: data/train.jsonl - split: validation path: data/validation.jsonl - split: test path: data/test.jsonl --- # Zarn Workflow Automation Instruct ## Dataset Description Natural-language workplace requests paired with plans and JSON tool actions. ## Team Attribution This dataset was created and reviewed by the Zarnite team through internal benchmark design, generation, and quality-control workflows. It should be presented as a Zarnite-authored benchmark starter pack, not as a purely human-collected field corpus. ## Ecosystem Need Tier High Ecosystem Need ## Why This Category Is Attractive Office orchestration is one of the clearest enterprise AI use cases, but public benchmark data still underserves realistic tool selection, policy handling, and handoff quality. ## Benchmark Goal Evaluate whether an assistant can coordinate a messy workflow across tools while preserving blockers, policy constraints, and handoff continuity. ## Included In This Folder - `data/train.jsonl`, `data/validation.jsonl`, `data/test.jsonl`: starter benchmark splits with 1200 total rows. - `schema.json`: JSON Schema for row validation. - `benchmark_spec.json`: metrics, quality gates, and target release scale. - `LICENSE.md`: folder-local license notice for self-contained publishing. - `PUBLISHING.md`: repo-specific publish instructions for Hugging Face. - `hf_repo_template.json`: machine-readable repo template used by the uploader script. ## Target Public Scale - Train: 30,000 - Validation: 3,000 - Test: 3,000 - Total target rows: 36,000 ## Recommended Metrics - `structured_action_f1` - `artifact_quality` - `policy_compliance` - `handoff_completeness` - `tool_selection_accuracy` ## Gold-Track Benchmark Assets - `ANNOTATION_GUIDELINES.md`: how to expand rows without drifting from the benchmark purpose. - `REVIEW_PROTOCOL.md`: how to audit validation and test rows with dual review and adjudication. - `BASELINE_EVAL_SPEC.json`: expected output contract, slice reporting, and release thresholds. - `RELEASE_CHECKLIST.md`: final pre-publish checks for the public Hugging Face release. - `SCORING_PROFILE.json`: prediction keys, scoring expectations, and slice reporting requirements. - `prediction_template.jsonl`: starter template for benchmark submissions or baseline runs. ## Expanded Row Anatomy - `objective_breakdown`: the operational goals hidden inside the request. - `workspace_state`: artifact health, tool inventory, threads, calendar limits, and CRM state. - `policy_bundle`: rules for approvals, communications, and handoff behavior. - `reference_artifacts`: gold drafts, CRM notes, and handoff outputs. - `difficulty_rationale`: why the row belongs in its difficulty bucket instead of a weaker slice. - `benchmark_slices`: named reporting slices such as approval friction, proof preservation, or citation traps. - `adversarial_features`, `expected_failure_modes`, and `review_readiness`: what the row is testing and how a gold-track reviewer should treat it. - `evidence_manifest`, `reference_variants`, and `negative_examples`: the source evidence boundary, acceptable alternate answers, and concrete failure cases. ## Hugging Face Deployment This folder is self-contained and can be uploaded as its own Hugging Face dataset repository. - Suggested repo id: `zarnite/zarn-workflow-automation-instruct` - Example upload command: `python upload_to_huggingface.py --dataset-folder "push/high-ecosystem-need/Zarn-Workflow-Automation-Instruct" --repo-id "zarnite/zarn-workflow-automation-instruct"` - You can swap the namespace by passing `--namespace YOUR_USERNAME` to the uploader. ## Local Evaluation - Example eval command: `python run_priority_eval.py --dataset-folder "push/high-ecosystem-need/Zarn-Workflow-Automation-Instruct" --splits validation test` - `prediction_template.jsonl` gives the required output shape for local or leaderboard-style submissions. ## License This package is marked `apache-2.0`. The rows in this folder are original starter examples for benchmark packaging.
提供机构:
zarnite
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作