five

Geraldine/mlintern-gemma4-unimarc-training

收藏
Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Geraldine/mlintern-gemma4-unimarc-training
下载链接
链接失效反馈
官方服务:
资源简介:
Gemma-4 UNIMARC训练数据集旨在微调google/gemma-4-E2B-it模型,从书籍和论文封面图像中提取书目元数据并生成有效的UNIMARC记录。该数据集结合了来自两个来源的真实书目数据和合成增强数据。数据集包含详细的来源信息、格式、覆盖的UNIMARC字段、训练脚本、依赖项、数据来源、许可和参考文献。该数据集专门用于训练视觉语言模型(VLM)以实现自动化的UNIMARC编目。

The Gemma-4 UNIMARC Training Dataset is designed for fine-tuning the google/gemma-4-E2B-it model to extract bibliographic metadata from book and thesis cover images and generate valid UNIMARC records. The dataset combines real bibliographic data from two sources with synthetic augmentation. It includes detailed information about the datasets sources, format, UNIMARC fields covered, training script, dependencies, data sources, licensing, and references. The dataset is specifically tailored for training Vision-Language Models (VLMs) for automated UNIMARC cataloguing.
提供机构:
Geraldine
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作