Real-Complex-Analysis-Math
收藏魔搭社区2025-11-27 更新2025-05-10 收录
下载链接:
https://modelscope.cn/datasets/prithivMLmods/Real-Complex-Analysis-Math
下载链接
链接失效反馈官方服务:
资源简介:
# Real-Complex-Analysis-Math
This dataset contains high-quality scanned pages from the classic mathematics textbook by **Walter Rudin**, widely used in advanced undergraduate and graduate-level courses on real and complex analysis. It is ideal for building OCR systems, digitizing textbooks, or creating educational AI tools for higher mathematics.
## Dataset Details
* **Source**: *Principles of Mathematical Analysis* and *Real and Complex Analysis* by Walter Rudin
* **Task**: Image-to-Text (OCR, textbook parsing)
* **Modality**: Image
* **Split**: `train`
* **Number of Samples**: 433 images
* **Size**:
* Dataset files: 353 MB
* Auto-converted Parquet: 382 MB
* **Language**: English
* **License**: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)
## Features
| Feature | Type |
| ------- | ----- |
| image | Image |
Each row consists of a scanned page from Rudin's textbook, suitable for OCR and mathematical content recognition tasks.
## Use Cases
* OCR training for mathematical text and symbols
* Digitization of classic mathematical literature
* Training data for document layout analysis
* Semantic understanding of advanced math textbooks
* Visual-based Q\&A from textbook pages
## How to Use
You can use the dataset via Hugging Face’s `datasets` library:
```python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Real-Complex-Analysis-Math")
```
Each sample looks like:
```python
{
"image": <PIL.Image>
}
```
## Tags
`math`, `Real-Complex-Analysis`, `textbooks`, `image-to-text`, `OCR`, `higher-education`, `Rudin`
## Citation
If you use this dataset in your research or development, please credit the original textbook author:
> Walter Rudin, *Real and Complex Analysis* and *Principles of Mathematical Analysis*
# 实复分析数学数据集(Real-Complex-Analysis-Math)
本数据集收录沃尔特·鲁丁(Walter Rudin)经典数学教材的高质量扫描页面,该教材广泛应用于高等本科生与研究生阶段的实分析、复分析课程。本数据集非常适合用于构建光学字符识别(Optical Character Recognition, OCR)系统、数字化教材,或开发面向高等数学的教育型人工智能工具。
## 数据集详情
* **来源**:沃尔特·鲁丁所著《数学分析原理》(*Principles of Mathematical Analysis*)与《实分析与复分析》(*Real and Complex Analysis*)
* **任务类型**:图像到文本(OCR、教材解析)
* **模态**:图像
* **划分集**:`train`
* **样本数量**:433张图像
* **文件大小**:
* 数据集原始文件:353 MB
* 自动转换的Parquet格式文件:382 MB
* **语言**:英语
* **许可协议**:Apache 2.0(https://www.apache.org/licenses/LICENSE-2.0)
## 特征说明
| 特征名称 | 数据类型 |
| ------- | ----- |
| 图像 | 图像 |
每一行对应鲁丁教材的一页扫描图像,适用于OCR与数学内容识别任务。
## 应用场景
* 面向数学文本与符号的OCR模型训练
* 经典数学文献的数字化工作
* 文档布局分析模型的训练数据
* 高等数学教材的语义理解任务
* 基于教材页面的可视化问答任务
## 使用方法
您可通过Hugging Face的`datasets`库加载本数据集:
python
from datasets import load_dataset
dataset = load_dataset("prithivMLmods/Real-Complex-Analysis-Math")
单条样本格式如下:
python
{
"image": <PIL.Image>
}
## 标签
`math`, `Real-Complex-Analysis-Math`, `textbooks`, `image-to-text`, `OCR`, `higher-education`, `Rudin`
## 引用说明
若您在研究或开发工作中使用本数据集,请注明原教材作者:
> 沃尔特·鲁丁,《实分析与复分析》与《数学分析原理》
提供机构:
maas
创建时间:
2025-05-06



