OGC_Quantum_Circuit_Papers
收藏魔搭社区2025-12-05 更新2025-08-09 收录
下载链接:
https://modelscope.cn/datasets/racineai/OGC_Quantum_Circuit_Papers
下载链接
链接失效反馈官方服务:
资源简介:
# **VDR_Quantum_Circuit_Papers – Overview**
**VDR_Quantum_Circuit_Papers** is a curated dataset focused on **quantum circuits** and **quantum gates**, extracted exclusively from scientific research papers. This dataset emphasizes documents that contain **circuit diagrams**, **matrix-based explanations**, and detailed discussions of quantum operations.
---
## **Dataset Composition**
This dataset was created using our open-source tool **[VDR_pdf-to-parquet](https://github.com/RacineAIOS/VDR_pdf-to-parquet)**.
Scientific PDFs were sourced from public online sources. Each document was selected based on its focus on **quantum circuits**, with visual and mathematical representations. The processing pipeline extracted:
* High-resolution **images** of quantum circuit diagrams
* Accompanying **textual content** such as explanations, equations, and operations
* Structured data for multimodal analysis and downstream tasks
We used **Google’s Gemini 2.5 Pro** model in a custom pipeline to generate diverse, expert-level questions that align with the content of each page.
---
## **Dataset Structure**
Each sample in the dataset includes:
* **`id`**: A unique identifier for each entry
* **`query`**: A synthetic technical question generated from that page
* **`image`**: A rendered image of the PDF page
* **`language`**: Detected language of the extracted text
---
## **Purpose**
This dataset is designed to support:
* **Training and evaluating vision-language models** on technical quantum content (especially quantum circuits)
* **Multimodal document understanding and retrieval** for quantum computing
* **Recognition and analysis of quantum circuits** in scientific literature
* **Research in automated extraction and interpretation of circuit diagrams** and related explanations
---
## **Creators**
* **Yumeng YE**
* **Léo APPOURCHAUX**
# **VDR_量子电路论文集(VDR_Quantum_Circuit_Papers) – 概述**
**VDR_量子电路论文集(VDR_Quantum_Circuit_Papers)** 是一套精心甄选的数据集,聚焦于**量子电路(quantum circuits)**与**量子门(quantum gates)**,所有数据均仅取自学术科研论文。该数据集重点收录包含**电路示意图、基于矩阵的推导阐释**以及针对量子操作的详细研讨的学术文献。
---
## **数据集构成**
本数据集依托开源工具**[VDR_pdf-to-parquet](https://github.com/RacineAIOS/VDR_pdf-to-parquet)** 构建。
科研PDF均源自公开网络资源,所有入选文档均以量子电路为核心主题,且包含可视化与数学表征形式。本次数据处理流水线提取了以下内容:
* 高分辨率量子电路示意图图像
* 配套文本内容,包括阐释文字、公式与操作说明
* 适用于多模态分析与下游任务的结构化数据
我们通过自定义数据处理流水线调用**谷歌Gemini 2.5 Pro**模型,生成与各页面内容匹配的多样化专业级技术问题。
---
## **数据集结构**
数据集中的每个样本包含以下字段:
* **`id`**:每条数据的唯一标识符
* **`query`**:从对应页面生成的人工合成技术问题
* **`image`**:PDF页面的渲染图像
* **`language`**:提取文本的检测语言
---
## **数据集用途**
本数据集旨在支持以下研究方向:
* 针对技术类量子内容(尤其是量子电路)的**视觉语言模型(vision-language models)**训练与评估
* 量子计算领域的多模态文档理解与检索
* 科研文献中量子电路的识别与分析
* 电路示意图及相关阐释的自动提取与解读相关研究
---
## **创作者**
* **叶宇萌(Yumeng YE)**
* **莱奥·阿普尔沙(Léo APPOURCHAUX)**
提供机构:
maas
创建时间:
2025-08-08



