Sellopale/Finchio
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Sellopale/Finchio
下载链接
链接失效反馈官方服务:
资源简介:
Finch是一个企业级基准测试数据集,用于评估代理在真实世界金融和会计工作流程中像熟练专家一样工作的能力。数据集聚焦于复杂且长期的金融和会计工作流程,涵盖数据输入/导入、结构化/格式化、网络搜索、跨表格/文件检索、计算、金融建模、验证、翻译、可视化和报告等任务。这些工作流程来源于真实企业工作环境,包括企业电子邮件线程、大型混乱的电子表格(包含文本、表格、公式、图表、数据透视表、图像等多模态内容)以及相互链接的PDF和文档。通过三步标注过程(从企业电子邮件中归纳工作流类型和实例、通过分析电子表格版本变化推导具体工作流实例、进行细致的专家标注),构建了172个企业级工作流程,涉及1,710个电子表格和2700万个单元格,充分体现了真实世界金融和会计工作的组合性、混乱性、多模态性和协作性。当前版本提供了前72个工作流程的完整标注,其余100个将在后续更新中发布。实验结果显示,即使是先进代理(如GPT 5.1 Pro和Claude Sonnet 4.5 Pro)也只能解决不到40%的工作流程,揭示了在真实企业场景中存在的显著性能差距。
Finch is an enterprise-grade benchmark for evaluating an agents ability to work like a skilled finance & accounting expert on real-world professional workflows. The dataset focuses on messy and long-horizon finance & accounting workflows spanning data entry/import, structuring/formatting, web search, cross-sheet/file retrieval, calculation, financial modeling, validation, translation, visualization, and reporting. The workflows are derived from real-world enterprise workspaces, including enterprise email threads, large and messy spreadsheets with multimodal artifacts (text, tables, formulas, charts, pivots, images, etc.), and interlinked PDFs and documents. Through a three-step workflow labeling process (inducing workflow types from enterprise emails, deriving instances from spreadsheet version analysis, and expert annotation), the dataset contains 172 enterprise-grade workflows involving 1,710 spreadsheets and 27 million cells, capturing the compositional, messy, multimodal, and collaborative nature of real-world finance & accounting work. This release provides full annotations for the first 72 workflows, with the remaining 100 to be released later. Experimental results show even frontier agents (GPT 5.1 Pro and Claude Sonnet 4.5 Pro) solve fewer than 40% of workflows, revealing a substantial performance gap for real enterprise scenarios.
提供机构:
Sellopale



