five

OGC_Quantum_Circuit_Synthetic

收藏
魔搭社区2025-10-03 更新2025-08-09 收录
下载链接:
https://modelscope.cn/datasets/racineai/OGC_Quantum_Circuit_Synthetic
下载链接
链接失效反馈
官方服务:
资源简介:
# OGC_Quantum_Circuit_Synthetic – Overview **OGC_Quantum_Circuit_Synthetic** is a curated multimodal dataset focused on synthetic quantum circuits. It combines generated circuit images with expert-level technical queries to support tasks such as RAG DSE, question answering, document search, and vision-language model training. --- ## Dataset Composition This dataset was created using our open-source tool **[OGC_pdf-to-parquet](https://github.com/RacineAIOS/OGC_pdf-to-parquet/tree/main)**, adapted to handle synthetic data. Quantum circuit images and descriptions were generated programmatically using **[Qiskit’s standard gate library](https://quantum.cloud.ibm.com/docs/en/api/qiskit/circuit_library#standard-gates)**. These were then used in a custom pipeline to produce technical queries aligned with each circuit using **Google’s Gemini 2.5 Pro model**. --- ## Dataset Structure Each entry in the dataset contains: - `id`: A unique identifier for the sample - `query`: A synthetic technical question generated from a quantum circuit - `image`: A visual rendering of the circuit diagram - `language`: The detected language of the query Each synthetic circuit produces 4 unique entries: a main technical query, a secondary one, a visual-based question, and a multimodal semantic query. --- ## Purpose This dataset is designed to support: - Training and evaluating vision-language models - Developing multimodal search or retrieval systems - Research in automated question generation for quantum circuits - Exploring synthetic quantum circuits as a use case in AI workflows --- ## Authors - **Yumeng Ye** - **Léo Appourchaux** ---

# OGC_Quantum_Circuit_Synthetic – 概述 **OGC_Quantum_Circuit_Synthetic**是一套经精心筛选整理的多模态数据集,专注于合成量子电路。该数据集将生成的电路图像与专家级技术查询相结合,可支撑检索增强生成(Retrieval-Augmented Generation,RAG)设计空间探索(Design Space Exploration,DSE)、问答、文档检索以及视觉语言模型(Vision-Language Model)训练等任务。 --- ## 数据集构成 本数据集基于适配了合成数据处理流程的开源工具**[OGC_pdf-to-parquet](https://github.com/RacineAIOS/OGC_pdf-to-parquet/tree/main)**构建。 研究人员借助**[Qiskit 标准量子门库](https://quantum.cloud.ibm.com/docs/en/api/qiskit/circuit_library#standard-gates)**通过编程方式生成量子电路图像与描述文本,随后依托自定义流水线,结合**Google Gemini 2.5 Pro 模型**为每个电路生成匹配的专业技术查询。 --- ## 数据集结构 数据集中的每条样本包含以下字段: - `id`:样本的唯一标识符 - `query`:基于量子电路生成的合成技术问题 - `image`:电路原理图的可视化渲染结果 - `language`:检测到的查询文本所用语言 每个合成量子电路可生成4条独特样本:1条主技术查询、1条子技术查询、1条基于视觉内容的问题以及1条多模态语义查询。 --- ## 数据集用途 本数据集旨在支持以下方向: - 视觉语言模型的训练与评估 - 多模态检索或搜索系统的开发 - 量子电路自动化问答生成相关研究 - 探索合成量子电路在AI工作流中的应用场景 --- ## 作者 - **叶宇萌(Yumeng Ye)** - **莱奥·阿普尔沙(Léo Appourchaux)**
提供机构:
maas
创建时间:
2025-08-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作