PyThagoreans-Merged
收藏魔搭社区2025-12-05 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/PyThagoreans-Merged
下载链接
链接失效反馈官方服务:
资源简介:
# PyThagoreans Dataset
## Overview
The PyThagoreans dataset is a comprehensive collection of math problems and their solutions, designed to assist in learning and practicing mathematical problem-solving. This dataset includes a variety of problems, expected answers, and predicted answers, making it a valuable resource for students, educators, and researchers.
## Dataset Details
### Modalities
- **Text**: The dataset primarily contains text data, including math problems and solutions.
### Formats
- **CSV**: The dataset is available in CSV format.
### Size
- The dataset contains 1,822,935 entries.
### Libraries
- **Datasets**: The dataset is compatible with the Hugging Face Datasets library.
- **Pandas**: Can be easily loaded and manipulated using Pandas.
- **Croissant**: Additional support for Croissant format.
## Dataset Structure
### Columns
- **question**: The text of the math problem.
- **expected_answer**: The correct answer to the problem.
- **predicted_answer**: The predicted answer to the problem.
- **is_correct**: A boolean indicating whether the predicted answer matches the expected answer.
- **generation_type**: The type of solution generation (e.g., masked_reference_solution, without_reference_solution).
- **dataset**: The source dataset of the problem.
- **generated_solution**: The generated solution to the problem.
### Splits
- **train**: Contains 1.58 million rows of training data.
## Usage
### Loading the Dataset
You can load the dataset using the Hugging Face Datasets library:
```python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/PyThagoreans")
```
### Example
Here’s an example of how to access the data:
```python
import pandas as pd
# Load the dataset
dataset = load_dataset("prithivMLmods/PyThagoreans")
# Convert to Pandas DataFrame
df = pd.DataFrame(dataset['train'])
# Display the first few rows
print(df.head())
```
# PyThagoreans 数据集(PyThagoreans Dataset)
## 概览
PyThagoreans 数据集是一套涵盖数学题目及其解答的综合资源集,旨在助力数学解题能力的学习与练习。该数据集包含各类数学题目、标准答案与模型预测答案,可为学生、教育工作者及研究人员提供极具价值的参考资源。
## 数据集详情
### 数据模态
- **文本(Text)**:数据集主要包含文本数据,涵盖数学题目与解答。
### 数据格式
- **CSV**:数据集以CSV格式提供。
### 数据集规模
- 数据集共包含1,822,935条数据条目。
### 兼容库
- **Datasets**:该数据集兼容Hugging Face Datasets库。
- **Pandas**:可通过Pandas库轻松完成加载与数据操作。
- **Croissant**:额外支持Croissant格式。
## 数据集结构
### 字段列名
- **question**:数学题目的文本内容。
- **expected_answer**:题目的标准答案。
- **predicted_answer**:题目的模型预测答案。
- **is_correct**:布尔型字段,用于标识预测答案是否与标准答案一致。
- **generation_type**:解答生成类型(例如:masked_reference_solution、without_reference_solution)。
- **dataset**:题目所属的源数据集。
- **generated_solution**:针对题目的生成式解答。
### 数据集划分
- **train**:包含158万条训练数据。
## 使用方法
### 数据集加载
你可通过Hugging Face Datasets库加载该数据集:
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/PyThagoreans")
### 使用示例
以下为访问该数据集的示例代码:
python
import pandas as pd
# 加载数据集
dataset = load_dataset("prithivMLmods/PyThagoreans")
# 转换为Pandas数据框
df = pd.DataFrame(dataset['train'])
# 查看前若干行数据
print(df.head())
提供机构:
maas
创建时间:
2025-01-20



