racineai/OGC_Hydrogen
收藏Hugging Face2025-08-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/racineai/OGC_Hydrogen
下载链接
链接失效反馈官方服务:
资源简介:
OGC - 组织化、分组、清洗氢愿景数据集,适用于图像/文本到向量(DSE)任务。该数据集通过抓取在线PDF文档并利用Google的Gemini 2.0 Flash Lite模型生成基于文档内容的合成查询构成。包含38,748条记录,语言分布为英语约占69%,法语约占31%。
OGC - Organized, Grouped, Cleaned Hydrogen Vision DSE dataset, intended for image/text to vector (DSE) tasks. The dataset is composed of synthetic queries generated from scraping online PDF documents using Googles Gemini 2.0 Flash Lite model, creating a diverse set of questions based on document content. It contains 38,748 entries, with a language distribution of approximately 69% English and 31% French.
提供机构:
racineai



