school_notebooks_EN
收藏魔搭社区2025-11-27 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/ai-forever/school_notebooks_EN
下载链接
链接失效反馈官方服务:
资源简介:
# School Notebooks Dataset
The images of school notebooks with handwritten notes in English.
The dataset annotation contain end-to-end markup for training detection and OCR models, as well as an end-to-end model for reading text from pages.
## Annotation format
The annotation is in COCO format. The `annotation.json` should have the following dictionaries:
- `annotation["categories"]` - a list of dicts with a categories info (categotiy names and indexes).
- `annotation["images"]` - a list of dictionaries with a description of images, each dictionary must contain fields:
- `file_name` - name of the image file.
- `id` for image id.
- `annotation["annotations"]` - a list of dictioraties with a murkup information. Each dictionary stores a description for one polygon from the dataset, and must contain the following fields:
- `image_id` - the index of the image on which the polygon is located.
- `category_id` - the polygon’s category index.
- `attributes` - dict with some additional annotation information. In the `translation` subdict you can find text translation for the line.
- `segmentation` - the coordinates of the polygon, a list of numbers - which are coordinate pairs x and y.
# 校园笔记本数据集(School Notebooks Dataset)
该数据集收录了带有英文手写笔记的校园笔记本图像。
本数据集的标注支持训练检测模型与光学字符识别(Optical Character Recognition,OCR)模型所需的端到端标注,同时也可用于面向页面文本读取的端到端模型训练。
## 标注格式
标注采用COCO格式。`annotation.json` 文件需包含以下字典结构:
- `annotation["categories"]`:存储类别信息的字典列表,涵盖类别名称与类别索引。
- `annotation["images"]`:存储图像元数据的字典列表,每个字典需包含以下字段:
- `file_name`:图像文件的文件名。
- `id`:图像的唯一标识ID。
- `annotation["annotations"]`:存储标注详情的字典列表,每个字典对应数据集中的一个多边形标注,需包含以下字段:
- `image_id`:该多边形所属图像的索引。
- `category_id`:该多边形的类别索引。
- `attributes`:存储额外标注信息的字典。其中`translation`子字典可获取对应文本行的文本转写内容。
- `segmentation`:多边形的坐标信息,由一组数字序列构成,每两个数字为一组x、y坐标对。
提供机构:
maas
创建时间:
2025-05-26



