five

DoclingMatix

收藏
魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/HuggingFaceM4/DoclingMatix
下载链接
链接失效反馈
官方服务:
资源简介:
# DoclingMatix DoclingMatix is a large-scale, multimodal dataset designed for training vision-language models in the domain of document intelligence. It was created specifically for training the SmolDocling model, an ultra-compact model for end-to-end document conversion. The dataset is constructed by augmenting Hugging Face's [Docmatix](https://huggingface.co/datasets/HuggingFaceM4/Docmatix). Each sample in Docmatix, which consists of a document image and a few questions and answers about it, has been transformed. The text field is now prepended with an instructional prompt, guiding a model to convert the document image into our structured DocTag format. This "prompt-tuning" format makes DoclingMatix ideal for training instruction-following models on document-related tasks. Document Conversion: The primary intended use is to train models that can take a document image as input and generate a structured text representation as output. Document Visual Question Answering (VQA): The dataset can be adapted for VQA tasks by creating question-answer pairs based on the document's content and structure. --- ## Dataset Statistics * **Total samples**: 1,270,911 * **Training set**: 1,270,911 * **Modalities**: Images, Text --- ## Intended Use * Training multimodal models for **document conversion** and **document visual question answering**. --- ## Citation If you use DoclingMatix, please cite: ```bibtex @article{nassar2025smoldocling, title={SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion}, author={Nassar, Ahmed and Marafioti, Andres and Omenetti, Matteo and Lysak, Maksym and Livathinos, Nikolaos and Auer, Christoph and Morin, Lucas and de Lima, Rafael Teixeira and Kim, Yusik and Gurbuz, A Said and others}, journal={arXiv preprint arXiv:2503.11576}, year={2025} } ```

# DoclingMatix DoclingMatix 是一款大规模多模态数据集,专为文档智能领域的视觉语言模型训练打造,其开发初衷是用于训练 SmolDocling 模型——一款面向端到端文档转换的超紧凑型视觉语言模型。 本数据集基于 Hugging Face 的 [Docmatix](https://huggingface.co/datasets/HuggingFaceM4/Docmatix) 数据集扩增构建。原 Docmatix 数据集的每个样本均包含一份文档图像及若干相关问答对,现已完成改造:在文本字段前新增指令提示词,引导模型将文档图像转换为结构化的 DocTag 格式。这种“指令微调”格式使得 DoclingMatix 非常适合训练可遵循指令的文档相关任务模型。 文档转换:其主要预期用途为训练以文档图像为输入、输出结构化文本表征的模型。 文档视觉问答(VQA):可基于文档内容与结构构建问答对,将该数据集适配至VQA任务中。 --- ## 数据集统计 * **总样本量**:1,270,911 * **训练集**:1,270,911 * **模态**:图像、文本 --- ## 预期用途 * 面向文档转换与文档视觉问答任务的多模态模型训练。 --- ## 引用说明 若使用 DoclingMatix 数据集,请引用如下文献: bibtex @article{nassar2025smoldocling, title={SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion}, author={Nassar, Ahmed and Marafioti, Andres and Omenetti, Matteo and Lysak, Maksym and Livathinos, Nikolaos and Auer, Christoph and Morin, Lucas and de Lima, Rafael Teixeira and Kim, Yusik and Gurbuz, A Said and others}, journal={arXiv preprint arXiv:2503.11576}, year={2025} }
提供机构:
maas
创建时间:
2025-08-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作