TRACE: Synthetic Eval and Prod Dataset
收藏DataCite Commons2026-05-01 更新2026-05-03 收录
下载链接:
https://dataverse.harvard.edu/citation?persistentId=doi:10.7910/DVN/T3JNUP
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains 80 synthetic queries used to validate the TRACE framework's closed-loop evaluation pipeline. Each query is labeled with its source dataset (Evaluation or Production), intent category, and role in the experiment. The Evaluation set comprises 36 items across five intents: three covered intents (product_specs, order_status, account_management; 10 queries each) that overlap with production traffic, and two stale intents (legacy_hardware, warranty_claims; 3 queries each) designed to test deprecation and reactivation. The Production set comprises 44 items: 40 queries across four gap intents (billing_returns, subscription_management, technical_troubleshooting, competitor_comparison; 10 queries each) that have no eval coverage at baseline, plus 4 warranty_claims queries introduced in a second run to test archive reactivation. Failure rates range from 0.1 (covered intents) to 0.8 (technical_troubleshooting), simulating varying severity of production-eval misalignment. The No-Intent Drift variant uses identical queries with intent labels removed.
提供机构:
Harvard Dataverse
创建时间:
2026-04-30



