VQA_Append_V2

Name: VQA_Append_V2
Creator: maas
Published: 2025-10-29 20:15:09
License: 暂无描述

魔搭社区2025-10-29 更新2025-11-08 收录

下载链接：

https://modelscope.cn/datasets/Stageholic/VQA_Append_V2

下载链接

链接失效反馈

官方服务：

资源简介：

# VQA题目生成器基于阿里云通义千问VL-Plus模型的VQA（Visual Question Answering）题目自动生成工具。 ## 功能特性 - 📁 **批量处理**: 自动遍历文件夹中的所有图像文件 - 🔍 **智能筛选**: 使用AI模型判断图像是否适合用于VQA测评 - 💾 **缓存机制**: 自动缓存图像筛选结果，避免重复API调用 - 🎯 **题目生成**: 为每张合适的图像生成10道测评题目 - 📊 **多维度评估**: 生成的题目涵盖物体识别、场景理解、空间关系、推理等多个维度 - ⚡ **多线程并发**: 支持两类线程池（判断8线程 + 生成8线程），大幅提升处理速度 - 🎨 **可视化界面**: 自动生成离线可用的交互式HTML界面，支持搜索、筛选、大图查看 ## 项目结构 ``` VQA_Append_V2/ ├── main.py # 主程序入口 ├── config.py # 配置管理 ├── cache_manager.py # 缓存管理模块 ├── qwen_api.py # Qwen-VL-Plus API调用封装 ├── utils.py # 工具函数 ├── generate_visualization.py # 可视化文件生成器 ├── visualization.html # 可视化HTML模板 ├── requirements.txt # Python依赖 ├── env_template.txt # 环境变量模板 ├── .gitignore # Git忽略规则 ├── README.md # 项目文档 ├── 使用指南.md # 中文使用指南 ├── 项目总览.md # 项目架构说明 ├── CHANGELOG.md # 版本更新日志 ├── 项目概述 # 项目需求说明 ├── output/ # 输出目录（自动生成） │ ├── *.json # VQA数据文件 │ └── vqa_viewer.html # 可视化HTML文件（自动生成） └── prompts/ # 提示词模板 ├── judge_suitability_prompt.txt ├── generate_questions_prompt.txt └── README.md ``` ## 安装步骤 ### 1. 克隆或下载项目 ```bash cd VQA_Append_V2 ``` ### 2. 安装Python依赖 ```bash pip install -r requirements.txt ``` ### 3. 测试API连接（推荐）在正式使用前，建议先测试API是否能正常连接： ```bash python test_api.py ``` 详细测试说明请查看 `测试说明.md` ## 使用方法 > 💡 **API配置已内置**: 本项目API配置已硬编码，无需额外配置，开箱即用！ ### 基本用法 ```bash python main.py --image_dir <图像文件夹路径> ``` ### 完整参数说明 ```bash python main.py --image_dir <图像文件夹路径> \ --output_dir <输出目录> \ --cache_file <缓存文件路径> \ --max_image_count <最大处理数量> \ --random_seed <随机种子> ``` **参数说明**: - `--image_dir`: （必需）包含图像文件的文件夹路径 - `--output_dir`: （可选）生成题目的输出目录，默认为 `output` - `--cache_file`: （可选）缓存文件路径，默认为 `image_cache.json` - `--max_image_count`: （可选）最大处理图像数量，用于测试，默认为 `None`（处理全部图像） - `--random_seed`: （可选）随机种子，用于可重现的随机选取，默认为 `None`（使用系统时间） ### 使用示例 ```bash # 示例1: 处理images文件夹中的图像 python main.py --image_dir ./images # 示例2: 指定输出目录 python main.py --image_dir ./images --output_dir ./results # 示例3: 指定缓存文件 python main.py --image_dir ./images --cache_file my_cache.json # 示例4: 测试模式 - 只处理前100张图像（节省API资源） python main.py --image_dir ./images --max_image_count 100 # 示例5: 使用随机种子进行可重现的随机选取 python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # 示例6: 综合使用多个参数 python main.py --image_dir ./images --output_dir ./test_results --max_image_count 50 --random_seed 123 ``` ## 工作流程 ### 多线程并发处理 **阶段1 - 图像判断（8线程并发）**: 1. 扫描指定文件夹中的所有图像文件（支持 .jpg, .png, .bmp, .gif, .webp 等格式） 2. 如果指定了 `max_image_count`，则从所有图像中随机选取指定数量的图像 3. 使用8个线程并发判断图像适用性 - 首先查询缓存，命中则跳过API调用 - 缓存未命中则调用Qwen-VL-Plus模型判断 - 将判断结果保存到缓存文件 4. 收集所有适合的图像 **阶段2 - 题目生成（8线程并发）**: 1. 对筛选出的适合图像，使用8个线程并发生成VQA题目 2. 每张图像生成10道多维度题目 3. 将生成的题目以JSON格式保存到输出目录 **阶段3 - 自动生成可视化（自动）**: 1. 读取所有生成的JSON文件 2. 生成独立的HTML可视化文件（`vqa_viewer.html`） 3. 所有数据嵌入HTML，完全离线可用 **性能提升**: 相比串行处理，多线程版本可提速**10倍以上**！ ## 输出内容程序运行完成后会生成两类文件： ### 1. JSON数据文件（output目录）每张图像生成的题目会保存为独立的JSON文件，格式如下： ```json { "image_path": "/path/to/image.jpg", "image_name": "image.jpg", "generated_time": "2025-10-27T12:00:00", "total_questions": 10, "questions": [ { "question": "图片中有多少个人？", "answer": "3个人", "category": "物体识别与计数" }, { "question": "图片的场景是在室内还是室外？", "answer": "室外", "category": "场景理解" } // ... 更多题目 ] } ``` ### 2. 可视化文件（output目录） **output/vqa_viewer_[目录名].html** - 独立可视化HTML文件 - ✅ 包含所有VQA数据（自动嵌入） - ✅ 完全离线可用，无需其他文件 - ✅ 可直接分享给他人查看 - ✅ 支持搜索、筛选、排序等功能 - ✅ 文件名包含输出目录名，便于区分不同批次的结果 **命名规则**： - 默认输出目录 `output` → `vqa_viewer_output.html` - 自定义输出目录 `output_3` → `vqa_viewer_output_3.html` - 自定义输出目录 `results` → `vqa_viewer_results.html` #### 使用方法 **方式1：双击打开** ```bash 双击 output/vqa_viewer_[目录名].html 文件 ``` **方式2：浏览器打开** ```bash 在浏览器中打开 output/vqa_viewer_[目录名].html ``` #### 功能特性 - 🔍 **搜索功能** - 搜索图像名称、问题或答案 - 🏷️ **分类筛选** - 按问题类别筛选 - 📊 **多种排序** - 按问题数量、时间等排序 - 🖼️ **图片查看** - 点击卡片查看详情，点击图片可放大 - 📱 **响应式设计** - 支持桌面和移动端 - 📑 **分页浏览** - 智能分页显示 #### 手动生成可视化如果需要单独生成可视化文件： ```bash # 基本用法（生成到output目录） python generate_visualization.py # 自定义输出目录和文件名 python generate_visualization.py --output_dir results --output_filename my_vqa.html # 生成带目录名的HTML文件 python generate_visualization.py --output_dir output_3 --output_filename vqa_viewer_output_3.html ``` **参数说明**: - `--output_dir`: JSON文件所在目录（默认：`output`） - `--html_template`: HTML模板文件（默认：`visualization.html`） - `--output_filename`: 输出文件名（默认：`vqa_viewer.html`，将保存在output_dir中） - `--quiet`: 静默模式 ## 缓存管理缓存文件（默认 `image_cache.json`）存储了每张图像的适用性判断结果，格式为： ```json { "/absolute/path/to/image1.jpg": true, "/absolute/path/to/image2.jpg": false } ``` **注意**: - 缓存使用图像的绝对路径作为键，确保唯一性 - 如果需要重新判断某张图像，可以从缓存文件中删除对应条目 - 如果需要清空所有缓存，直接删除 `image_cache.json` 文件 ## 题目能力维度生成的VQA题目涵盖以下能力维度： - 🔍 **物体识别与计数**: 识别图像中的物体及其数量 - 🌄 **场景理解**: 理解图像的整体场景和环境 - 📐 **空间关系**: 判断物体之间的位置关系 - 🎨 **属性识别**: 识别颜色、形状、大小等属性 - 📝 **文字识别**: 识别图像中的文字内容（OCR） - 💡 **推理与常识**: 需要推理和常识知识的问题 - 🔎 **细节观察**: 关注图像中的细节信息 ## 支持的图像格式 - JPEG (.jpg, .jpeg) - PNG (.png) - BMP (.bmp) - GIF (.gif) - WebP (.webp) ## 随机选取功能 ### 随机选取机制当使用 `--max_image_count` 参数时，程序会从所有找到的图像中**随机选取**指定数量的图像进行处理，而不是按扫描顺序截取。 **优势**： - 🎲 **随机性**：避免总是处理相同的前N张图像 - 🔄 **可重现**：使用 `--random_seed` 参数可以确保结果可重现 - 📊 **代表性**：随机选取能更好地代表整个图像集合 - 🧪 **测试友好**：便于进行多次测试和比较 **使用方式**： ```bash # 随机选取20张图像（每次结果不同） python main.py --image_dir ./images --max_image_count 20 # 使用固定种子，确保结果可重现 python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # 处理全部图像（不使用随机选取） python main.py --image_dir ./images ``` **随机种子说明**： - `--random_seed None`（默认）：使用系统时间作为种子，每次运行结果不同 - `--random_seed 42`：使用固定种子，确保结果可重现 - 相同种子 + 相同图像集合 = 相同的选取结果 ## 性能优化 ### 多线程配置程序默认使用**两阶段线程池架构**，大幅提升处理速度： - **阶段1（图像判断）**: 8个线程并发 - **阶段2（题目生成）**: 8个线程并发 **性能对比**（100张图像为例）： - 串行处理：约22分钟 - 并行处理：约2分钟 - **提速11倍！** ### 自定义线程数在 `config.py` 中调整： ```python # 多线程配置 judgment_workers: int = 8 # 图像判断线程数 generation_workers: int = 8 # 题目生成线程数 ``` **建议配置**： - 小批量（<100张）：4线程 + 4线程 - 中批量（100-500张）：8线程 + 8线程 - 大批量（>500张）：16线程 + 8线程 ### Prompt自定义所有提示词存储在 `prompts/` 文件夹： - `judge_suitability_prompt.txt` - 图像判断提示词 - `generate_questions_prompt.txt` - 题目生成提示词 **修改方法**：直接编辑文件，重新运行程序即可生效，无需修改代码。 ## 注意事项 1. **API配置**: API配置已硬编码在 `config.py` 中，如需修改请编辑该文件 2. **网络连接**: 需要稳定的网络连接访问阿里云DashScope API（关闭代理） 3. **API限流**: 注意API的调用频率限制，如遇限流可降低线程数 4. **图像质量**: 确保图像清晰、内容丰富，以获得更好的题目质量 5. **缓存机制**: 善用缓存功能避免重复处理同一图像 6. **线程安全**: 程序已实现线程安全机制，可放心使用多线程 7. **测试模式**: 首次运行建议使用 `--max_image_count` 参数限制处理数量，验证效果后再全量处理 ## 故障排除 ### 问题1: API调用失败 **可能原因**: - 网络连接问题 - API配置错误 - API调用频率超限 **解决方案**: - 检查网络连接 - 验证 `config.py` 中的API配置是否正确 - 等待一段时间后重试 ### 问题2: 未找到图像文件 **解决方案**: - 检查 `--image_dir` 参数指定的路径是否正确 - 确认文件夹中包含支持的图像格式 ## 技术特性 ### 已实现功能 - ✅ 多线程并发处理（判断8线程 + 生成8线程） - ✅ 线程安全的资源访问（锁机制保护） - ✅ Prompt文件化管理（可热重载） - ✅ 智能缓存机制（避免重复API调用） - ✅ 实时进度追踪（两阶段进度显示） - ✅ 自动重试机制（最多3次） - ✅ 交互式可视化（独立HTML文件，完全离线） ### 未来计划 - [ ] 动态线程数调整（根据API响应自动优化） - [ ] 断点续传功能（支持中断后继续） - [ ] 可视化进度条（使用tqdm） - [ ] 支持更多多模态模型 - [ ] 导出PDF/Excel报告 ## 许可证本项目仅供学习和研究使用。 ## 联系方式如有问题或建议，欢迎提交Issue或Pull Request。

# VQA Question Generator A VQA (Visual Question Answering) automatic question generation tool based on Alibaba Cloud's Tongyi Qianwen VL-Plus model. ## Functional Features - 📁 **Batch Processing**: Automatically traverse all image files in a folder - 🔍 **Intelligent Screening**: Use AI models to determine whether images are suitable for VQA evaluation - 💾 **Caching Mechanism**: Automatically cache image screening results to avoid repeated API calls - 🎯 **Question Generation**: Generate 10 evaluation questions for each suitable image - 📊 **Multi-dimensional Evaluation**: The generated questions cover multiple dimensions such as object recognition, scene understanding, spatial relations, and reasoning - ⚡ **Multi-threaded Concurrency**: Support two types of thread pools (8 threads for judgment + 8 threads for generation) to greatly improve processing speed - 🎨 **Visual Interface**: Automatically generate an offline usable interactive HTML interface with support for search, filtering, and large-image viewing ## Project Structure VQA_Append_V2/ ├── main.py # Main program entry ├── config.py # Configuration Management ├── cache_manager.py # Cache Management Module ├── qwen_api.py # Qwen-VL-Plus API Call Wrapper ├── utils.py # Utility Functions ├── generate_visualization.py # Visualization File Generator ├── visualization.html # Visualization HTML Template ├── requirements.txt # Python Dependencies ├── env_template.txt # Environment Variable Template ├── .gitignore # Git Ignore Rules ├── README.md # Project Documentation ├── 使用指南.md # Chinese User Guide ├── 项目总览.md # Project Architecture Description ├── CHANGELOG.md # Version Update Log ├── 项目概述 # Project Requirement Description ├── output/ # Output Directory (Auto-generated) │ ├── *.json # VQA Data Files │ └── vqa_viewer.html # Visualization HTML File (Auto-generated) └── prompts/ # Prompt Templates ├── judge_suitability_prompt.txt ├── generate_questions_prompt.txt └── README.md ## Installation Steps ### 1. Clone or Download the Project bash cd VQA_Append_V2 ### 2. Install Python Dependencies bash pip install -r requirements.txt ### 3. Test API Connection (Recommended) Before formal use, it is recommended to test whether the API can connect normally: bash python test_api.py For detailed test instructions, please check `测试说明.md` ## Usage > 💡 **API Configuration is Built-in**: This project's API configuration is hard-coded, no additional configuration required, ready to use! ### Basic Usage bash python main.py --image_dir <image folder path> ### Complete Parameter Description bash python main.py --image_dir <image folder path> --output_dir <output directory> --cache_file <cache file path> --max_image_count <maximum number of images to process> --random_seed <random seed> **Parameter Explanation**: - `--image_dir`: (Required) Path to the folder containing image files - `--output_dir`: (Optional) Output directory for generated questions, default is `output` - `--cache_file`: (Optional) Cache file path, default is `image_cache.json` - `--max_image_count`: (Optional) Maximum number of images to process for testing, default is `None` (process all images) - `--random_seed`: (Optional) Random seed for reproducible random selection, default is `None` (use system time) ### Usage Examples bash # Example 1: Process images in the images folder python main.py --image_dir ./images # Example 2: Specify the output directory python main.py --image_dir ./images --output_dir ./results # Example 3: Specify the cache file python main.py --image_dir ./images --cache_file my_cache.json # Example 4: Test mode - process only the first 100 images (save API resources) python main.py --image_dir ./images --max_image_count 100 # Example 5: Use random seed for reproducible random selection python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # Example 6: Comprehensive use of multiple parameters python main.py --image_dir ./images --output_dir ./test_results --max_image_count 50 --random_seed 123 ## Workflow ### Multi-threaded Concurrent Processing **Phase 1 - Image Judgment (8-thread Concurrency)**: 1. Scan all image files in the specified folder (supports formats such as .jpg, .png, .bmp, .gif, .webp) 2. If `max_image_count` is specified, randomly select the specified number of images from all images 3. Use 8 threads to concurrently judge image suitability - First query the cache, skip API calls if hit - Call Qwen-VL-Plus model for judgment if cache miss - Save the judgment result to the cache file 4. Collect all suitable images **Phase 2 - Question Generation (8-thread Concurrency)**: 1. For the filtered suitable images, use 8 threads to concurrently generate VQA questions 2. Generate 10 multi-dimensional questions for each image 3. Save the generated questions in JSON format to the output directory **Phase 3 - Automatic Visualization Generation (Automatic)**: 1. Read all generated JSON files 2. Generate a standalone HTML visualization file (`vqa_viewer.html`) 3. All data is embedded in HTML, fully offline available **Performance Improvement**: Compared to serial processing, the multi-threaded version can achieve a speedup of more than 10 times! ## Output Content After the program runs, two types of files will be generated: ### 1. JSON Data Files (output directory) The questions generated for each image are saved as independent JSON files in the following format: json { "image_path": "/path/to/image.jpg", "image_name": "image.jpg", "generated_time": "2025-10-27T12:00:00", "total_questions": 10, "questions": [ { "question": "How many people are in the picture?", "answer": "3 people", "category": "Object Recognition and Counting" }, { "question": "Is the scene in the picture indoor or outdoor?", "answer": "Outdoor", "category": "Scene Understanding" } // ... More questions ] } ### 2. Visualization File (output directory) **output/vqa_viewer_[directory_name].html** - Standalone visualization HTML file - ✅ All VQA data is embedded automatically - ✅ Fully offline usable, no additional files required - ✅ Can be directly shared with others - ✅ Supports search, filtering, sorting and other functions - ✅ File name includes the output directory name for easy distinction between different batches of results **Naming Rules**: - Default output directory `output` → `vqa_viewer_output.html` - Custom output directory `output_3` → `vqa_viewer_output_3.html` - Custom output directory `results` → `vqa_viewer_results.html` #### Usage **Method 1: Double-click to open** bash Double-click the output/vqa_viewer_[directory_name].html file **Method 2: Open in browser** bash Open output/vqa_viewer_[directory_name].html in a browser #### Functional Features - 🔍 **Search Function** - Search for image names, questions or answers - 🏷️ **Category Filtering** - Filter by question category - 📊 **Multiple Sorting** - Sort by number of questions, time, etc. - 🖼️ **Image Viewing** - Click the card to view details, click the image to zoom in - 📱 **Responsive Design** - Supports desktop and mobile devices - 📑 **Paged Browsing** - Intelligent paged display #### Manually Generate Visualization If you need to generate the visualization file separately: bash # Basic usage (generate to output directory) python generate_visualization.py # Custom output directory and file name python generate_visualization.py --output_dir results --output_filename my_vqa.html # Generate HTML file with directory name python generate_visualization.py --output_dir output_3 --output_filename vqa_viewer_output_3.html **Parameter Explanation**: - `--output_dir`: Directory where JSON files are located (default: `output`) - `--html_template`: HTML template file (default: `visualization.html`) - `--output_filename`: Output file name (default: `vqa_viewer.html`, will be saved in output_dir) - `--quiet`: Quiet mode ## Cache Management The cache file (default `image_cache.json`) stores the suitability judgment result of each image in the following format: json { "/absolute/path/to/image1.jpg": true, "/absolute/path/to/image2.jpg": false } **Notes**: - The cache uses the absolute path of the image as the key to ensure uniqueness - If you need to re-judge an image, you can delete the corresponding entry from the cache file - If you need to clear all caches, simply delete the `image_cache.json` file ## Question Capability Dimensions The generated VQA questions cover the following capability dimensions: - 🔍 **Object Recognition and Counting**: Recognize objects in the image and their quantities - 🌄 **Scene Understanding**: Understand the overall scene and environment of the image - 📐 **Spatial Relations**: Judge the positional relationship between objects - 🎨 **Attribute Recognition**: Recognize attributes such as color, shape, size, etc. - 📝 **Text Recognition**: Recognize text content in the image (OCR) - 💡 **Reasoning and Common Sense**: Questions that require reasoning and common sense knowledge - 🔎 **Detail Observation**: Focus on detailed information in the image ## Supported Image Formats - JPEG (.jpg, .jpeg) - PNG (.png) - BMP (.bmp) - GIF (.gif) - WebP (.webp) ## Random Selection Function ### Random Selection Mechanism When using the `--max_image_count` parameter, the program will **randomly select** the specified number of images from all found images for processing, instead of intercepting in scan order. **Advantages**: - 🎲 **Randomness**: Avoid always processing the same first N images - 🔄 **Reproducibility**: Use the `--random_seed` parameter to ensure reproducible results - 📊 **Representativeness**: Random selection can better represent the entire image collection - 🧪 **Test-friendly**: Facilitate multiple tests and comparisons ### Usage bash # Randomly select 20 images (different results each time) python main.py --image_dir ./images --max_image_count 20 # Use fixed seed to ensure reproducible results python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # Process all images (no random selection) python main.py --image_dir ./images **Random Seed Explanation**: - `--random_seed None` (default): Use system time as the seed, results will be different each run - `--random_seed 42`: Use fixed seed to ensure reproducible results - Same seed + same image collection = same selection result ## Performance Optimization ### Multi-thread Configuration The program uses a **two-phase thread pool architecture** by default, which greatly improves processing speed: - **Phase 1 (Image Judgment)**: 8 threads for concurrency - **Phase 2 (Question Generation)**: 8 threads for concurrency **Performance Comparison** (for 100 images): - Serial processing: ~22 minutes - Parallel processing: ~2 minutes - **Speedup of 11 times!** ### Customize Thread Count Adjust in `config.py`: python # Multi-thread Configuration judgment_workers: int = 8 # Number of threads for image judgment generation_workers: int = 8 # Number of threads for question generation **Recommended Configuration**: - Small batch (<100 images): 4 threads + 4 threads - Medium batch (100-500 images): 8 threads + 8 threads - Large batch (>500 images): 16 threads + 8 threads ### Prompt Customization All prompt words are stored in the `prompts/` folder: - `judge_suitability_prompt.txt` - Image judgment prompt - `generate_questions_prompt.txt` - Question generation prompt **Modification Method**: Directly edit the file and rerun the program to take effect without modifying the code. ## Notes 1. **API Configuration**: The API configuration is hard-coded in `config.py`, edit this file if you need to modify it 2. **Network Connection**: A stable network connection is required to access Alibaba Cloud DashScope API (turn off the proxy) 3. **API Rate Limiting**: Pay attention to the API call frequency limit, reduce the number of threads if you encounter rate limiting 4. **Image Quality**: Ensure that the images are clear and rich in content to obtain better question quality 5. **Caching Mechanism**: Make good use of the caching function to avoid repeated processing of the same image 6. **Thread Safety**: The program has implemented thread-safe mechanisms, and multi-threading can be used with confidence 7. **Test Mode**: It is recommended to use the `--max_image_count` parameter to limit the number of processed images for the first run, verify the effect before full-scale processing ## Troubleshooting ### Problem 1: API Call Failure **Possible Reasons**: - Network connection issues - Incorrect API configuration - Exceeded API call frequency limit **Solutions**: - Check the network connection - Verify whether the API configuration in `config.py` is correct - Retry after waiting for a period of time ### Problem 2: No Image Files Found **Solutions**: - Check whether the path specified by the `--image_dir` parameter is correct - Confirm that the folder contains supported image formats ## Technical Features ### Implemented Features - ✅ Multi-threaded concurrent processing (8 threads for judgment + 8 threads for generation) - ✅ Thread-safe resource access (lock mechanism protection) - ✅ Prompt file-based management (hot reloadable) - ✅ Intelligent caching mechanism (avoid repeated API calls) - ✅ Real-time progress tracking (two-phase progress display) - ✅ Automatic retry mechanism (up to 3 times) - ✅ Interactive visualization (standalone HTML file, fully offline) ### Future Plans - [ ] Dynamic thread count adjustment (automatically optimize based on API response) - [ ] Breakpoint continuation function (support continuation after interruption) - [ ] Visual progress bar (using tqdm) - [ ] Support for more multimodal models - [ ] Export PDF/Excel reports ## License This project is for learning and research purposes only. ## Contact If you have any questions or suggestions, please submit an Issue or Pull Request.

提供机构：

maas

创建时间：

2025-10-29

5,000+

优质数据集

54 个

任务类型

进入经典数据集