tw-math-reasoning-2k
收藏魔搭社区2025-12-05 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/twinkle-ai/tw-math-reasoning-2k
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for tw-math-reasoning-2k

**tw-math-reasoning-2k** 是一個繁體中文數學語言資料集,從 [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH) 英文數學題庫中精選 2,000 題,並透過 [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) 模型以繁體中文重新生成具邏輯性且詳盡的解題過程與最終答案。此資料集可作為訓練或評估繁體中文數學推理模型的高品質參考語料。
## Dataset Details
### Dataset Description
**tw-math-reasoning-2k** 是一個繁體中文數學語言資料集,旨在提供高品質的解題語料以支援中文數學推理模型的訓練與評估。此資料集從 [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH) 英文數學題庫中精選 2,000 題,涵蓋代數、幾何、機率統計等各類題型,並確保題目類型分佈均衡。
所有題目皆經由 [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) 模型重新生成,透過多輪提示設計誘導模型產出繁體中文的詳細解題步驟與邏輯推演,最終形成完整的繁中答案。為確保語料品質,資料在生成後進行清洗與過濾,剔除明顯錯誤或缺乏邏輯的回答,同時統一最終答案的格式為 `\boxed{}` ,方便後續應用於標準答案比對與模型訓練。
本資料集適用於繁體中文大型語言模型的微調訓練與推理能力測試,亦可作為 Chain-of-Thought 推理訓練的基礎語料之一。
- **Curated by:** [Minyi Chen](https://huggingface.co/minyichen)
- **Funded by:** [APMIC](https://www.apmic.ai/)
- **Shared by:** [Minyi Chen](https://huggingface.co/minyichen)
- **Language(s) (NLP):** Traditional Chinese & English
- **License:** MIT
### Dataset Sources
<!-- Provide the basic links for the dataset. -->
- **Repository:** [twinkle-ai/tw-math-reasoning-2k](https://huggingface.co/datasets/twinkle-ai/tw-math-reasoning-2k)
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
- 微調繁體中文大型語言模型(LLMs)以提升其數學推理與解題能力。
- 評估繁體中文語言模型在「多步驟推理(chain-of-thought)」任務上的表現。
- 作為學術研究中的標註語料,用於探討數學語言理解與自然語言生成的交集。
- 作為教學語料,用於展示數學題目的語言化解題範例,支援數學與語言教育應用。
### Out-of-Scope Use
> [!warning]
> 本資料集重視學術研究與模型開發用途,不鼓勵任何非研究的使用。
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
- **應用高風險決策:** 例如金融建議、工程設計、醫療診斷等,因為生成的解題過程雖具邏輯性但未經數學專家逐一審核,不保證所有答案絕對正確。
- **重建原始資料題庫** 本資料集僅為繁體中文生成版,並不包含原始英文解答,無法作為 [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH) 的完整替代。
## Dataset Structure
每筆資料為一組包含英文數學題目與繁體中文回答的對應資料,格式如下:
```json
{
'problem', # 原始英文數學題目
'level', # 題目難度等級(如 high school, olympiad 等)
'type', # 題型分類(如 algebra、geometry、number theory 等)
'solution', # HuggingFaceH4/MATH 原始英文解答(僅參考)
'subset', # 來源子集名稱(如 'train')
'split', # 資料分割(目前皆為 'train')
'model', # 生成模型
'problem_zhtw', # 題目繁體中文翻譯(可用於多語訓練)
'think', # 模型生成的繁體中文解題過程(邏輯推理)
'answer', # 模型生成的最終答案(通常以 `\boxed{}` 呈現)
'messages' # 完整對話訊息結構(包含提示詞、回應、角色等)
}
```
> 💡 模型回答以繁體中文輸出,並保留題目原文以利跨語言對應與訓練。
## Dataset Creation
### Curation Rationale
**tw-math-reasoning-2k** 的設計初衷在於彌補繁體中文數學推理語料的稀缺現況。雖然 Hugging Face 上已有如 [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH) 等高品質英文數學題庫,但繁體中文語境下的解題語料仍極為稀少,限制了中文大型語言模型在數學推理領域的發展與評估能力。
因此,本資料集透過精選自原始 MATH 題庫的 2,000 題題目,結合具多輪推理能力的 [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) 模型生成繁體中文解題過程,旨在提供具邏輯性、語意自然且格式一致的訓練與評估樣本。我們特別注重答案的可驗證性與過程的教學價值,確保資料能支援如 Chain-of-Thought 推理訓練、答案解析生成等多樣應用場景。
此資料集亦可作為日後擴充更大規模繁中數學語料的基礎樣本庫。透過小規模高品質起步,逐步建立起繁體中文數學推理研究的語料基石。
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing
- **來源資料集:** [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH)
- **取樣數量:** 2,000 題(各類型題目均衡取樣)
- **回答生成:** 使用 [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) 模型以多輪提示精調,生成具備邏輯推理的完整中文解題過程
- **資料清洗:**
- 過濾模型明顯錯誤或不合邏輯的回應
- 擷取最終答案並統一為 LaTeX `\boxed{}` 格式
## Bias, Risks, and Limitations
- 並非所有生成回應皆經人工審查,可能存在邏輯錯誤或非標準解法。
- 資料集強調**解題過程表達能力**,非單純數值答對即可。
### Recommendations
使用 **tw-math-reasoning-2k** 時,建議注意以下幾點,以充分理解其適用範圍與潛在限制:
- **模型生成偏誤**:資料集中之解題過程由 [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) 模型生成,可能會受到原始模型訓練語料與提示設計的影響,造成某些解法過於冗長、不夠直觀,或在特定題型上採取非標準解法。
- **數學正確性風險**:雖經過基本清洗與錯誤過濾,部分解題邏輯仍可能存在細節錯誤、計算誤差或不嚴謹推理,建議在高精度應用場景中搭配額外驗證機制使用。
- **語言與格式一致性**:資料以繁體中文呈現,但個別題目可能仍包含 LaTeX 符號、數學術語或模型特有用語風格,使用於教學或教材時建議進行語言風格統整。
- **有限樣本規模**:本資料集僅包含 2,000 題,屬於小型精選集,適合作為研究、微調或標準推理風格的參考;不建議直接用於大規模模型 pretraining。
## Citation
如果您使用本資料集,請引用:
```yaml
@misc{twmath2k2025,
title = {tw-math-reasoning-2k: Traditional Chinese Mathematical Reasoning Dataset},
author = {Twinkle AI},
year = {2025},
note = {Available at: \url{https://huggingface.co/datasets/twinkle-ai/tw-math-reasoning-2k}; Generated using \url{https://huggingface.co/perplexity-ai/r1-1776} from the HuggingFaceH4/MATH dataset}
}
```
## Dataset Card Authors
[Twinkle AI](https://huggingface.co/twinkle-ai)
## Dataset Card Contact
[Twinkle AI](https://huggingface.co/twinkle-ai)
# Dataset Card for tw-math-reasoning-2k

**tw-math-reasoning-2k** is a Traditional Chinese mathematical language dataset curated from 2,000 selected problems from the English mathematics question bank [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH), regenerated into logical and detailed problem-solving steps and final answers in Traditional Chinese using the [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) model. This dataset can serve as high-quality reference corpus for training or evaluating Traditional Chinese mathematical reasoning models.
## Dataset Details
### Dataset Description
**tw-math-reasoning-2k** is a Traditional Chinese mathematical language dataset designed to provide high-quality problem-solving corpus to support the training and evaluation of Chinese mathematical reasoning models. This dataset curates 2,000 problems from [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH), covering various question types such as algebra, geometry, probability and statistics, with a balanced distribution of question types.
All problems are regenerated using the [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) model. Multi-turn prompt engineering is employed to induce the model to produce detailed problem-solving steps and logical deductions in Traditional Chinese, forming a complete Traditional Chinese answer. To ensure corpus quality, the data is cleaned and filtered after generation, removing obviously incorrect or illogical responses, while unifying the format of the final answer to `oxed{}` to facilitate subsequent applications such as standard answer matching and model training.
This dataset is suitable for fine-tuning Traditional Chinese Large Language Models (LLMs) and testing their reasoning capabilities, and can also serve as one of the basic corpora for Chain-of-Thought reasoning training.
- **Curated by:** [Minyi Chen](https://huggingface.co/minyichen)
- **Funded by:** [APMIC](https://www.apmic.ai/)
- **Shared by:** [Minyi Chen](https://huggingface.co/minyichen)
- **Language(s) (NLP):** Traditional Chinese & English
- **License:** MIT
### Dataset Sources
<!-- Provide the basic links for the dataset. -->
- **Repository:** [twinkle-ai/tw-math-reasoning-2k](https://huggingface.co/datasets/twinkle-ai/tw-math-reasoning-2k)
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
### Direct Use
- Fine-tune Traditional Chinese Large Language Models (LLMs) to enhance their mathematical reasoning and problem-solving capabilities.
- Evaluate the performance of Traditional Chinese language models on multi-step reasoning (Chain-of-Thought) tasks.
- Serve as annotated corpus in academic research to explore the intersection of mathematical language understanding and natural language generation.
- Serve as teaching corpus to demonstrate linguistically framed problem-solving examples for mathematics and language education applications.
### Out-of-Scope Use
> [!warning]
> This dataset prioritizes academic research and model development purposes, and any non-research use is discouraged.
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
- **High-risk decision-making applications:** Such as financial advice, engineering design, medical diagnosis, etc., because although the generated problem-solving processes are logical, they have not been reviewed one by one by mathematical experts, and cannot guarantee that all answers are absolutely correct.
- **Reconstructing the original question bank:** This dataset is only a Traditional Chinese generated version and does not include the original English solutions, so it cannot serve as a complete replacement for [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH).
## Dataset Structure
Each entry is a corresponding dataset containing an English mathematical problem and a Traditional Chinese answer, in the following format:
json
{
"problem": "", // Original English mathematical problem
"level": "", // Question difficulty level (e.g., high school, olympiad, etc.)
"type": "", // Question type classification (e.g., algebra, geometry, number theory, etc.)
"solution": "", // Original English solution from HuggingFaceH4/MATH (for reference only)
"subset": "", // Source subset name (e.g., 'train')
"split": "", // Data split (currently all 'train')
"model": "", // Generative model used
"problem_zhtw": "", // Traditional Chinese translation of the problem (usable for multilingual training)
"think": "", // Traditional Chinese problem-solving process generated by the model (logical reasoning)
"answer": "", // Final answer generated by the model, usually presented in `oxed{}` format
"messages": "" // Complete conversation message structure including prompts, responses, roles, etc.
}
> 💡 Model responses are output in Traditional Chinese, and the original problem text is retained to facilitate cross-language alignment and training.
## Dataset Creation
### Curation Rationale
The original design intention of **tw-math-reasoning-2k** is to address the scarcity of Traditional Chinese mathematical reasoning corpora. Although there are high-quality English mathematical question banks such as [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH) on Hugging Face, problem-solving corpora in the Traditional Chinese context are still extremely rare, which limits the development and evaluation capabilities of Chinese Large Language Models in the field of mathematical reasoning.
Therefore, this dataset selects 2,000 problems from the original MATH question bank, combined with the [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) model with multi-round reasoning capabilities to generate Traditional Chinese problem-solving processes, aiming to provide logically consistent, naturally semantic and uniformly formatted training and evaluation samples. We specifically focus on the verifiability of answers and the teaching value of the process, ensuring that the data can support various application scenarios such as Chain-of-Thought reasoning training and answer explanation generation.
This dataset can also serve as a basic corpus for expanding larger-scale Traditional Chinese mathematical corpora in the future. Starting with a small-scale high-quality dataset, we aim to gradually establish a corpus foundation for Traditional Chinese mathematical reasoning research.
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
#### Data Collection and Processing
- **Source dataset:** [HuggingFaceH4/MATH](https://huggingface.co/datasets/HuggingFaceH4/MATH)
- **Sampled quantity:** 2,000 problems (balanced sampling across all question types)
- **Answer generation:** Use the [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) model with multi-turn prompt tuning to generate complete Chinese problem-solving processes with logical reasoning
- **Data cleaning:**
- Filter out obviously incorrect or illogical model responses
- Extract the final answer and unify it to the LaTeX `oxed{}` format
## Bias, Risks, and Limitations
- Not all generated responses have been manually reviewed, and logical errors or non-standard solution methods may exist.
- The dataset emphasizes **problem-solving process expression capabilities**, not just correct numerical answers.
### Recommendations
When using **tw-math-reasoning-2k**, the following points are recommended to fully understand its scope of application and potential limitations:
- **Model generation biases:** The problem-solving processes in the dataset are generated by the [perplexity-ai/r1-1776](https://huggingface.co/perplexity-ai/r1-1776) model, which may be affected by the original model's training corpus and prompt design, resulting in some solutions being too verbose, not intuitive enough, or adopting non-standard solutions for certain question types.
- **Mathematical correctness risks:** Although basic cleaning and error filtering have been performed, some problem-solving logic may still contain detailed errors, calculation errors or imprecise reasoning. It is recommended to use additional verification mechanisms in high-precision application scenarios.
- **Language and format consistency:** The data is presented in Traditional Chinese, but individual problems may still contain LaTeX symbols, mathematical terms or model-specific terminology styles. It is recommended to unify the language style when using it for teaching or teaching materials.
- **Limited sample size:** This dataset only contains 2,000 problems, which is a small curated set, suitable for research, fine-tuning or reference for standard reasoning styles; it is not recommended to directly use it for large-scale model pretraining.
## Citation
If you use this dataset, please cite:
yaml
@misc{twmath2k2025,
title = {tw-math-reasoning-2k: Traditional Chinese Mathematical Reasoning Dataset},
author = {Twinkle AI},
year = {2025},
note = {Available at: url{https://huggingface.co/datasets/twinkle-ai/tw-math-reasoning-2k}; Generated using url{https://huggingface.co/perplexity-ai/r1-1776} from the HuggingFaceH4/MATH dataset}
}
## Dataset Card Authors
[Twinkle AI](https://huggingface.co/twinkle-ai)
## Dataset Card Contact
[Twinkle AI](https://huggingface.co/twinkle-ai)
提供机构:
maas
创建时间:
2025-05-20



