OGC_Quantum_Circuit_Synthetic
收藏魔搭社区2025-10-03 更新2025-08-09 收录
下载链接:
https://modelscope.cn/datasets/racineai/OGC_Quantum_Circuit_Synthetic
下载链接
链接失效反馈官方服务:
资源简介:
# OGC_Quantum_Circuit_Synthetic – Overview
**OGC_Quantum_Circuit_Synthetic** is a curated multimodal dataset focused on synthetic quantum circuits. It combines generated circuit images with expert-level technical queries to support tasks such as RAG DSE, question answering, document search, and vision-language model training.
---
## Dataset Composition
This dataset was created using our open-source tool **[OGC_pdf-to-parquet](https://github.com/RacineAIOS/OGC_pdf-to-parquet/tree/main)**, adapted to handle synthetic data.
Quantum circuit images and descriptions were generated programmatically using **[Qiskit’s standard gate library](https://quantum.cloud.ibm.com/docs/en/api/qiskit/circuit_library#standard-gates)**. These were then used in a custom pipeline to produce technical queries aligned with each circuit using **Google’s Gemini 2.5 Pro model**.
---
## Dataset Structure
Each entry in the dataset contains:
- `id`: A unique identifier for the sample
- `query`: A synthetic technical question generated from a quantum circuit
- `image`: A visual rendering of the circuit diagram
- `language`: The detected language of the query
Each synthetic circuit produces 4 unique entries:
a main technical query, a secondary one, a visual-based question, and a multimodal semantic query.
---
## Purpose
This dataset is designed to support:
- Training and evaluating vision-language models
- Developing multimodal search or retrieval systems
- Research in automated question generation for quantum circuits
- Exploring synthetic quantum circuits as a use case in AI workflows
---
## Authors
- **Yumeng Ye**
- **Léo Appourchaux**
---
# OGC_Quantum_Circuit_Synthetic – 概述
**OGC_Quantum_Circuit_Synthetic**是一套经精心筛选整理的多模态数据集,专注于合成量子电路。该数据集将生成的电路图像与专家级技术查询相结合,可支撑检索增强生成(Retrieval-Augmented Generation,RAG)设计空间探索(Design Space Exploration,DSE)、问答、文档检索以及视觉语言模型(Vision-Language Model)训练等任务。
---
## 数据集构成
本数据集基于适配了合成数据处理流程的开源工具**[OGC_pdf-to-parquet](https://github.com/RacineAIOS/OGC_pdf-to-parquet/tree/main)**构建。
研究人员借助**[Qiskit 标准量子门库](https://quantum.cloud.ibm.com/docs/en/api/qiskit/circuit_library#standard-gates)**通过编程方式生成量子电路图像与描述文本,随后依托自定义流水线,结合**Google Gemini 2.5 Pro 模型**为每个电路生成匹配的专业技术查询。
---
## 数据集结构
数据集中的每条样本包含以下字段:
- `id`:样本的唯一标识符
- `query`:基于量子电路生成的合成技术问题
- `image`:电路原理图的可视化渲染结果
- `language`:检测到的查询文本所用语言
每个合成量子电路可生成4条独特样本:1条主技术查询、1条子技术查询、1条基于视觉内容的问题以及1条多模态语义查询。
---
## 数据集用途
本数据集旨在支持以下方向:
- 视觉语言模型的训练与评估
- 多模态检索或搜索系统的开发
- 量子电路自动化问答生成相关研究
- 探索合成量子电路在AI工作流中的应用场景
---
## 作者
- **叶宇萌(Yumeng Ye)**
- **莱奥·阿普尔沙(Léo Appourchaux)**
提供机构:
maas
创建时间:
2025-08-08



