zarnite/zarn-workflow-automation-instruct
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zarnite/zarn-workflow-automation-instruct
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
annotations_creators:
- expert-generated
- machine-generated
language_creators:
- expert-generated
- machine-generated
source_datasets:
- original
task_categories:
- text-generation
- question-answering
tags:
- zarnite
- benchmark
- workflow-automation
- agents
- productivity
- gold-track
- benchmark-starter
pretty_name: Zarn Workflow Automation Instruct
size_categories:
- 10K<n<100K
configs:
- config_name: default
data_files:
- split: train
path: data/train.jsonl
- split: validation
path: data/validation.jsonl
- split: test
path: data/test.jsonl
---
# Zarn Workflow Automation Instruct
## Dataset Description
Natural-language workplace requests paired with plans and JSON tool actions.
## Team Attribution
This dataset was created and reviewed by the Zarnite team through internal benchmark design, generation, and quality-control workflows. It should be presented as a Zarnite-authored benchmark starter pack, not as a purely human-collected field corpus.
## Ecosystem Need Tier
High Ecosystem Need
## Why This Category Is Attractive
Office orchestration is one of the clearest enterprise AI use cases, but public benchmark data still underserves realistic tool selection, policy handling, and handoff quality.
## Benchmark Goal
Evaluate whether an assistant can coordinate a messy workflow across tools while preserving blockers, policy constraints, and handoff continuity.
## Included In This Folder
- `data/train.jsonl`, `data/validation.jsonl`, `data/test.jsonl`: starter benchmark splits with 1200 total rows.
- `schema.json`: JSON Schema for row validation.
- `benchmark_spec.json`: metrics, quality gates, and target release scale.
- `LICENSE.md`: folder-local license notice for self-contained publishing.
- `PUBLISHING.md`: repo-specific publish instructions for Hugging Face.
- `hf_repo_template.json`: machine-readable repo template used by the uploader script.
## Target Public Scale
- Train: 30,000
- Validation: 3,000
- Test: 3,000
- Total target rows: 36,000
## Recommended Metrics
- `structured_action_f1`
- `artifact_quality`
- `policy_compliance`
- `handoff_completeness`
- `tool_selection_accuracy`
## Gold-Track Benchmark Assets
- `ANNOTATION_GUIDELINES.md`: how to expand rows without drifting from the benchmark purpose.
- `REVIEW_PROTOCOL.md`: how to audit validation and test rows with dual review and adjudication.
- `BASELINE_EVAL_SPEC.json`: expected output contract, slice reporting, and release thresholds.
- `RELEASE_CHECKLIST.md`: final pre-publish checks for the public Hugging Face release.
- `SCORING_PROFILE.json`: prediction keys, scoring expectations, and slice reporting requirements.
- `prediction_template.jsonl`: starter template for benchmark submissions or baseline runs.
## Expanded Row Anatomy
- `objective_breakdown`: the operational goals hidden inside the request.
- `workspace_state`: artifact health, tool inventory, threads, calendar limits, and CRM state.
- `policy_bundle`: rules for approvals, communications, and handoff behavior.
- `reference_artifacts`: gold drafts, CRM notes, and handoff outputs.
- `difficulty_rationale`: why the row belongs in its difficulty bucket instead of a weaker slice.
- `benchmark_slices`: named reporting slices such as approval friction, proof preservation, or citation traps.
- `adversarial_features`, `expected_failure_modes`, and `review_readiness`: what the row is testing and how a gold-track reviewer should treat it.
- `evidence_manifest`, `reference_variants`, and `negative_examples`: the source evidence boundary, acceptable alternate answers, and concrete failure cases.
## Hugging Face Deployment
This folder is self-contained and can be uploaded as its own Hugging Face dataset repository.
- Suggested repo id: `zarnite/zarn-workflow-automation-instruct`
- Example upload command: `python upload_to_huggingface.py --dataset-folder "push/high-ecosystem-need/Zarn-Workflow-Automation-Instruct" --repo-id "zarnite/zarn-workflow-automation-instruct"`
- You can swap the namespace by passing `--namespace YOUR_USERNAME` to the uploader.
## Local Evaluation
- Example eval command: `python run_priority_eval.py --dataset-folder "push/high-ecosystem-need/Zarn-Workflow-Automation-Instruct" --splits validation test`
- `prediction_template.jsonl` gives the required output shape for local or leaderboard-style submissions.
## License
This package is marked `apache-2.0`. The rows in this folder are original starter examples for benchmark packaging.
提供机构:
zarnite



