racineai/VDR_Quantum_Circuit_Synthetic
收藏Hugging Face2025-11-20 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/racineai/VDR_Quantum_Circuit_Synthetic
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
tags:
- RAG
- quantum
- quantum circuit
- synthetic
- physics
- DSE
task_categories:
- visual-document-retrieval
- text-retrieval
---
# VDR_Quantum_Circuit_Synthetic – Overview
**VDR_Quantum_Circuit_Synthetic** is a curated multimodal dataset focused on synthetic quantum circuits. It combines generated circuit images with expert-level technical queries to support tasks such as RAG DSE, question answering, document search, and vision-language model training.
---
## Dataset Composition
This dataset was created using our open-source tool **[VDR_pdf-to-parquet](https://github.com/RacineAIOS/VDR_pdf-to-parquet/tree/main)**, adapted to handle synthetic data.
Quantum circuit images and descriptions were generated programmatically using **[Qiskit’s standard gate library](https://quantum.cloud.ibm.com/docs/en/api/qiskit/circuit_library#standard-gates)**. These were then used in a custom pipeline to produce technical queries aligned with each circuit using **Google’s Gemini 2.5 Pro model**.
---
## Dataset Structure
Each entry in the dataset contains:
- `id`: A unique identifier for the sample
- `query`: A synthetic technical question generated from a quantum circuit
- `image`: A visual rendering of the circuit diagram
- `language`: The detected language of the query
Each synthetic circuit produces 4 unique entries:
a main technical query, a secondary one, a visual-based question, and a multimodal semantic query.
---
## Purpose
This dataset is designed to support:
- Training and evaluating vision-language models
- Developing multimodal search or retrieval systems
- Research in automated question generation for quantum circuits
- Exploring synthetic quantum circuits as a use case in AI workflows
---
## Authors
- **Yumeng Ye**
- **Léo Appourchaux**
---
提供机构:
racineai



