five

ewdfd/SMART

收藏
Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/ewdfd/SMART
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: SMART language: - en license: mit task_categories: - question-answering - text-generation - text-classification tags: - mathematics - reasoning - llm-evaluation - benchmark - education - chain-of-thought size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: test path: SMART.jsonl --- # SMART: Evaluating LLMs’ Mathematical Reasoning via a Human Cognitive Process-Inspired Benchmark SMART is a fine-grained benchmark for evaluating large language models (LLMs) on mathematical reasoning from a human cognitive process perspective. Instead of evaluating only the final answer, SMART decomposes mathematical problem solving into four cognitive dimensions inspired by Pólya’s problem-solving theory: 1. **Semantic Understanding** 2. **Mathematical Reasoning** 3. **Arithmetic Computation** 4. **Reflection & Refinement** SMART is designed to diagnose where a model succeeds or fails during the problem-solving process, rather than reducing reasoning to a shallow input-output mapping. The benchmark contains **10,000 test instances**, including **2,000 seed questions** and **8,000 dimension-specific task variants**. ## Data Fields Each SMART instance contains the following fields: - `question`: the original seed math word problem. - `notation`: the notation-based arithmetic form derived from the original problem, used to isolate arithmetic computation. - `background`: the structured background information extracted from the question, including the problem scenario, goal, known and unknown quantities, relationships and constraints, and potentially irrelevant information. - `smt-lib`: the SMT-LIB symbolic formalization of the problem, used to evaluate mathematical reasoning through executable logical structure. - `answer`: the answer associated with the instance. - `gt_answer`: the ground-truth final answer. - `gt_cot`: the ground-truth chain-of-thought solution. - `w_cot`: a corrupted or incorrect chain-of-thought solution with injected errors, used for Reflection & Refinement evaluation. - `wrong`: an indicator of whether the `w_cot` contains an injected error pattern. Notes: - `question` corresponds to the original mathematical problem. - `background` is mainly used for the Understanding dimension. - `smt-lib` is mainly used for the Reasoning dimension. - `notation` is mainly used for the Arithmetic dimension. - `gt_cot`, `w_cot`, and `wrong` are mainly used for the Reflection & Refinement dimension. An example structure is shown below: ```json { "question": "Josh decides to try flipping a house. He buys a house for $80,000 and then puts in $50,000 in repairs. This increased the value of the house by 150%. How much profit did he make?", "notation": "a=80000, b=50000, c=a+b, d=1.5a, e=d+a, f=e-c, f?", "background": "```json\n{\n \"problem_description\": {\n \"problem_scenario\": \"Josh buys a house, invests in repairs, and sells it for profit.\",\n \"goal\": \"Calculate the profit Josh made from flipping the house.\"\n },\n \"quantities\": {\n \"known\": [\n \"Initial house purchase cost: $80,000\",\n \"Cost of repairs: $50,000\",\n \"Increase in house value: 150%\"\n ],\n \"unknown\": [\n \"Profit made from flipping the house\"\n ]\n },\n \"relationships_and_constraints\": [\n \"Total cost = purchase cost + repair cost\",\n \"House value increases by a specified percentage\",\n \"Profit = selling price - total cost\"\n ],\n \"potentially_irrelevant_info\": []\n}\n```", "smt-lib": " (set-logic QF_NRA)(declare-fun a () Real) (declare-fun b () Real) (declare-fun c () Real) (declare-fun d () Real) (declare-fun e () Real) (declare-fun f () Real) (assert (= a 80000))(assert (= b 50000))(assert (= c (+ a b)))(assert (= d (* a 1.5)))(assert (= e (+ d a)))(assert (= f (- e c)))(check-sat)(get-value (f))", "answer": 70000.0, "gt_answer": 70000.0, "gt_cot": "The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000\nHe increased the value of the house by 80,000*1.5=<<80000*1.5=120000>>120,000\nSo the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000\nSo he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000\n#### 70000", "w_cot": "The cost of the house and repairs came out to 80,000+50,000=$<<80000+50000=130000>>130,000.He increased the value of the house by 80,000*1.5=<<80000*1.5=144561>>120,000.So the new value of the house is 120,000+80,000=$<<120000+80000=200000>>200,000.So he made a profit of 200,000-130,000=$<<200000-130000=70000>>70,000.#### 70000.", "wrong": 1 }
提供机构:
ewdfd
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作