MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/13294002

下载链接

链接失效反馈

官方服务：

资源简介：

1. follow_up.jsonl This file contains entries that facilitate follow-up questioning. Each line consists of three keys: question: Sourced from the GSM8k testing set. answer: Corresponding answer from the GSM8k testing set. followup: Includes two rounds of follow-up questions and reference answers, formatted as a conversation between a user (A:) and an assistant (B:). 2. error_correction.jsonl This file is designed for error correction tasks. Each line consists of three keys: question: Sourced from the GSM8k testing set. answer: Corresponding answer from the GSM8k testing set. error_correction: Contains a conversation between a user (A:) and an assistant (B:), which includes the original question, an incorrect answer, and the process of correcting the error. 3. error_analysis.jsonl This file also focuses on error correction but employs a different prompt strategy. Each line consists of three keys: question: Sourced from the GSM8k testing set. answer: Corresponding answer from the GSM8k testing set. error_analysis: Includes a conversation between a user (A:) and an assistant (B:), where the model is prompted to independently determine the correctness of the answer without being explicitly told. 4. p2p_generation.jsonl This file contains entries for problem generation tasks. Each line consists of three keys: question: Sourced from the GSM8k testing set. answer: Corresponding answer from the GSM8k testing set. new_problem: A new problem generated by GPT-4 to serve as a reference answer.

创建时间：

2024-08-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集