MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13294002
下载链接
链接失效反馈官方服务:
资源简介:
1. follow_up.jsonl
This file contains entries that facilitate follow-up questioning. Each line consists of three keys:
question: Sourced from the GSM8k testing set.
answer: Corresponding answer from the GSM8k testing set.
followup: Includes two rounds of follow-up questions and reference answers, formatted as a conversation between a user (A:) and an assistant (B:).
2. error_correction.jsonl
This file is designed for error correction tasks. Each line consists of three keys:
question: Sourced from the GSM8k testing set.
answer: Corresponding answer from the GSM8k testing set.
error_correction: Contains a conversation between a user (A:) and an assistant (B:), which includes the original question, an incorrect answer, and the process of correcting the error.
3. error_analysis.jsonl
This file also focuses on error correction but employs a different prompt strategy. Each line consists of three keys:
question: Sourced from the GSM8k testing set.
answer: Corresponding answer from the GSM8k testing set.
error_analysis: Includes a conversation between a user (A:) and an assistant (B:), where the model is prompted to independently determine the correctness of the answer without being explicitly told.
4. p2p_generation.jsonl
This file contains entries for problem generation tasks. Each line consists of three keys:
question: Sourced from the GSM8k testing set.
answer: Corresponding answer from the GSM8k testing set.
new_problem: A new problem generated by GPT-4 to serve as a reference answer.
创建时间:
2024-08-11



