ia_ocr
收藏魔搭社区2025-11-27 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/moondream/ia_ocr
下载链接
链接失效反馈官方服务:
资源简介:
Contains pages from documents sourced from the Internet Archive, transcribed by Pixtral. Not super accurate, but useful during pretraining.
```
@misc{moondream_ia_ocr,
author = {Vikhyat Korrapati},
title = {IA OCR Dataset},
year = {2025},
url = {https://huggingface.co/datasets/moondream/ia_ocr},
note = {Accessed: 2025-03-07}
}
```
本数据集包含源自互联网档案馆(Internet Archive)的文档页面,经Pixtral转录生成。其转录准确性欠佳,但在预训练阶段仍具备实用价值。
@misc{moondream_ia_ocr,
author = {Vikhyat Korrapati},
title = {IA OCR 数据集},
year = {2025},
url = {https://huggingface.co/datasets/moondream/ia_ocr},
note = {访问日期:2025-03-07}
}
提供机构:
maas
创建时间:
2025-05-26



