five

VQA_Append_V2

收藏
魔搭社区2025-10-29 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/Stageholic/VQA_Append_V2
下载链接
链接失效反馈
官方服务:
资源简介:
# VQA题目生成器 基于阿里云通义千问VL-Plus模型的VQA(Visual Question Answering)题目自动生成工具。 ## 功能特性 - 📁 **批量处理**: 自动遍历文件夹中的所有图像文件 - 🔍 **智能筛选**: 使用AI模型判断图像是否适合用于VQA测评 - 💾 **缓存机制**: 自动缓存图像筛选结果,避免重复API调用 - 🎯 **题目生成**: 为每张合适的图像生成10道测评题目 - 📊 **多维度评估**: 生成的题目涵盖物体识别、场景理解、空间关系、推理等多个维度 - ⚡ **多线程并发**: 支持两类线程池(判断8线程 + 生成8线程),大幅提升处理速度 - 🎨 **可视化界面**: 自动生成离线可用的交互式HTML界面,支持搜索、筛选、大图查看 ## 项目结构 ``` VQA_Append_V2/ ├── main.py # 主程序入口 ├── config.py # 配置管理 ├── cache_manager.py # 缓存管理模块 ├── qwen_api.py # Qwen-VL-Plus API调用封装 ├── utils.py # 工具函数 ├── generate_visualization.py # 可视化文件生成器 ├── visualization.html # 可视化HTML模板 ├── requirements.txt # Python依赖 ├── env_template.txt # 环境变量模板 ├── .gitignore # Git忽略规则 ├── README.md # 项目文档 ├── 使用指南.md # 中文使用指南 ├── 项目总览.md # 项目架构说明 ├── CHANGELOG.md # 版本更新日志 ├── 项目概述 # 项目需求说明 ├── output/ # 输出目录(自动生成) │ ├── *.json # VQA数据文件 │ └── vqa_viewer.html # 可视化HTML文件(自动生成) └── prompts/ # 提示词模板 ├── judge_suitability_prompt.txt ├── generate_questions_prompt.txt └── README.md ``` ## 安装步骤 ### 1. 克隆或下载项目 ```bash cd VQA_Append_V2 ``` ### 2. 安装Python依赖 ```bash pip install -r requirements.txt ``` ### 3. 测试API连接(推荐) 在正式使用前,建议先测试API是否能正常连接: ```bash python test_api.py ``` 详细测试说明请查看 `测试说明.md` ## 使用方法 > 💡 **API配置已内置**: 本项目API配置已硬编码,无需额外配置,开箱即用! ### 基本用法 ```bash python main.py --image_dir <图像文件夹路径> ``` ### 完整参数说明 ```bash python main.py --image_dir <图像文件夹路径> \ --output_dir <输出目录> \ --cache_file <缓存文件路径> \ --max_image_count <最大处理数量> \ --random_seed <随机种子> ``` **参数说明**: - `--image_dir`: (必需)包含图像文件的文件夹路径 - `--output_dir`: (可选)生成题目的输出目录,默认为 `output` - `--cache_file`: (可选)缓存文件路径,默认为 `image_cache.json` - `--max_image_count`: (可选)最大处理图像数量,用于测试,默认为 `None`(处理全部图像) - `--random_seed`: (可选)随机种子,用于可重现的随机选取,默认为 `None`(使用系统时间) ### 使用示例 ```bash # 示例1: 处理images文件夹中的图像 python main.py --image_dir ./images # 示例2: 指定输出目录 python main.py --image_dir ./images --output_dir ./results # 示例3: 指定缓存文件 python main.py --image_dir ./images --cache_file my_cache.json # 示例4: 测试模式 - 只处理前100张图像(节省API资源) python main.py --image_dir ./images --max_image_count 100 # 示例5: 使用随机种子进行可重现的随机选取 python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # 示例6: 综合使用多个参数 python main.py --image_dir ./images --output_dir ./test_results --max_image_count 50 --random_seed 123 ``` ## 工作流程 ### 多线程并发处理 **阶段1 - 图像判断(8线程并发)**: 1. 扫描指定文件夹中的所有图像文件(支持 .jpg, .png, .bmp, .gif, .webp 等格式) 2. 如果指定了 `max_image_count`,则从所有图像中随机选取指定数量的图像 3. 使用8个线程并发判断图像适用性 - 首先查询缓存,命中则跳过API调用 - 缓存未命中则调用Qwen-VL-Plus模型判断 - 将判断结果保存到缓存文件 4. 收集所有适合的图像 **阶段2 - 题目生成(8线程并发)**: 1. 对筛选出的适合图像,使用8个线程并发生成VQA题目 2. 每张图像生成10道多维度题目 3. 将生成的题目以JSON格式保存到输出目录 **阶段3 - 自动生成可视化(自动)**: 1. 读取所有生成的JSON文件 2. 生成独立的HTML可视化文件(`vqa_viewer.html`) 3. 所有数据嵌入HTML,完全离线可用 **性能提升**: 相比串行处理,多线程版本可提速**10倍以上**! ## 输出内容 程序运行完成后会生成两类文件: ### 1. JSON数据文件(output目录) 每张图像生成的题目会保存为独立的JSON文件,格式如下: ```json { "image_path": "/path/to/image.jpg", "image_name": "image.jpg", "generated_time": "2025-10-27T12:00:00", "total_questions": 10, "questions": [ { "question": "图片中有多少个人?", "answer": "3个人", "category": "物体识别与计数" }, { "question": "图片的场景是在室内还是室外?", "answer": "室外", "category": "场景理解" } // ... 更多题目 ] } ``` ### 2. 可视化文件(output目录) **output/vqa_viewer_[目录名].html** - 独立可视化HTML文件 - ✅ 包含所有VQA数据(自动嵌入) - ✅ 完全离线可用,无需其他文件 - ✅ 可直接分享给他人查看 - ✅ 支持搜索、筛选、排序等功能 - ✅ 文件名包含输出目录名,便于区分不同批次的结果 **命名规则**: - 默认输出目录 `output` → `vqa_viewer_output.html` - 自定义输出目录 `output_3` → `vqa_viewer_output_3.html` - 自定义输出目录 `results` → `vqa_viewer_results.html` #### 使用方法 **方式1:双击打开** ```bash 双击 output/vqa_viewer_[目录名].html 文件 ``` **方式2:浏览器打开** ```bash 在浏览器中打开 output/vqa_viewer_[目录名].html ``` #### 功能特性 - 🔍 **搜索功能** - 搜索图像名称、问题或答案 - 🏷️ **分类筛选** - 按问题类别筛选 - 📊 **多种排序** - 按问题数量、时间等排序 - 🖼️ **图片查看** - 点击卡片查看详情,点击图片可放大 - 📱 **响应式设计** - 支持桌面和移动端 - 📑 **分页浏览** - 智能分页显示 #### 手动生成可视化 如果需要单独生成可视化文件: ```bash # 基本用法(生成到output目录) python generate_visualization.py # 自定义输出目录和文件名 python generate_visualization.py --output_dir results --output_filename my_vqa.html # 生成带目录名的HTML文件 python generate_visualization.py --output_dir output_3 --output_filename vqa_viewer_output_3.html ``` **参数说明**: - `--output_dir`: JSON文件所在目录(默认:`output`) - `--html_template`: HTML模板文件(默认:`visualization.html`) - `--output_filename`: 输出文件名(默认:`vqa_viewer.html`,将保存在output_dir中) - `--quiet`: 静默模式 ## 缓存管理 缓存文件(默认 `image_cache.json`)存储了每张图像的适用性判断结果,格式为: ```json { "/absolute/path/to/image1.jpg": true, "/absolute/path/to/image2.jpg": false } ``` **注意**: - 缓存使用图像的绝对路径作为键,确保唯一性 - 如果需要重新判断某张图像,可以从缓存文件中删除对应条目 - 如果需要清空所有缓存,直接删除 `image_cache.json` 文件 ## 题目能力维度 生成的VQA题目涵盖以下能力维度: - 🔍 **物体识别与计数**: 识别图像中的物体及其数量 - 🌄 **场景理解**: 理解图像的整体场景和环境 - 📐 **空间关系**: 判断物体之间的位置关系 - 🎨 **属性识别**: 识别颜色、形状、大小等属性 - 📝 **文字识别**: 识别图像中的文字内容(OCR) - 💡 **推理与常识**: 需要推理和常识知识的问题 - 🔎 **细节观察**: 关注图像中的细节信息 ## 支持的图像格式 - JPEG (.jpg, .jpeg) - PNG (.png) - BMP (.bmp) - GIF (.gif) - WebP (.webp) ## 随机选取功能 ### 随机选取机制 当使用 `--max_image_count` 参数时,程序会从所有找到的图像中**随机选取**指定数量的图像进行处理,而不是按扫描顺序截取。 **优势**: - 🎲 **随机性**:避免总是处理相同的前N张图像 - 🔄 **可重现**:使用 `--random_seed` 参数可以确保结果可重现 - 📊 **代表性**:随机选取能更好地代表整个图像集合 - 🧪 **测试友好**:便于进行多次测试和比较 **使用方式**: ```bash # 随机选取20张图像(每次结果不同) python main.py --image_dir ./images --max_image_count 20 # 使用固定种子,确保结果可重现 python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # 处理全部图像(不使用随机选取) python main.py --image_dir ./images ``` **随机种子说明**: - `--random_seed None`(默认):使用系统时间作为种子,每次运行结果不同 - `--random_seed 42`:使用固定种子,确保结果可重现 - 相同种子 + 相同图像集合 = 相同的选取结果 ## 性能优化 ### 多线程配置 程序默认使用**两阶段线程池架构**,大幅提升处理速度: - **阶段1(图像判断)**: 8个线程并发 - **阶段2(题目生成)**: 8个线程并发 **性能对比**(100张图像为例): - 串行处理:约22分钟 - 并行处理:约2分钟 - **提速11倍!** ### 自定义线程数 在 `config.py` 中调整: ```python # 多线程配置 judgment_workers: int = 8 # 图像判断线程数 generation_workers: int = 8 # 题目生成线程数 ``` **建议配置**: - 小批量(<100张):4线程 + 4线程 - 中批量(100-500张):8线程 + 8线程 - 大批量(>500张):16线程 + 8线程 ### Prompt自定义 所有提示词存储在 `prompts/` 文件夹: - `judge_suitability_prompt.txt` - 图像判断提示词 - `generate_questions_prompt.txt` - 题目生成提示词 **修改方法**:直接编辑文件,重新运行程序即可生效,无需修改代码。 ## 注意事项 1. **API配置**: API配置已硬编码在 `config.py` 中,如需修改请编辑该文件 2. **网络连接**: 需要稳定的网络连接访问阿里云DashScope API(关闭代理) 3. **API限流**: 注意API的调用频率限制,如遇限流可降低线程数 4. **图像质量**: 确保图像清晰、内容丰富,以获得更好的题目质量 5. **缓存机制**: 善用缓存功能避免重复处理同一图像 6. **线程安全**: 程序已实现线程安全机制,可放心使用多线程 7. **测试模式**: 首次运行建议使用 `--max_image_count` 参数限制处理数量,验证效果后再全量处理 ## 故障排除 ### 问题1: API调用失败 **可能原因**: - 网络连接问题 - API配置错误 - API调用频率超限 **解决方案**: - 检查网络连接 - 验证 `config.py` 中的API配置是否正确 - 等待一段时间后重试 ### 问题2: 未找到图像文件 **解决方案**: - 检查 `--image_dir` 参数指定的路径是否正确 - 确认文件夹中包含支持的图像格式 ## 技术特性 ### 已实现功能 - ✅ 多线程并发处理(判断8线程 + 生成8线程) - ✅ 线程安全的资源访问(锁机制保护) - ✅ Prompt文件化管理(可热重载) - ✅ 智能缓存机制(避免重复API调用) - ✅ 实时进度追踪(两阶段进度显示) - ✅ 自动重试机制(最多3次) - ✅ 交互式可视化(独立HTML文件,完全离线) ### 未来计划 - [ ] 动态线程数调整(根据API响应自动优化) - [ ] 断点续传功能(支持中断后继续) - [ ] 可视化进度条(使用tqdm) - [ ] 支持更多多模态模型 - [ ] 导出PDF/Excel报告 ## 许可证 本项目仅供学习和研究使用。 ## 联系方式 如有问题或建议,欢迎提交Issue或Pull Request。

# VQA Question Generator A VQA (Visual Question Answering) automatic question generation tool based on Alibaba Cloud's Tongyi Qianwen VL-Plus model. ## Functional Features - 📁 **Batch Processing**: Automatically traverse all image files in a folder - 🔍 **Intelligent Screening**: Use AI models to determine whether images are suitable for VQA evaluation - 💾 **Caching Mechanism**: Automatically cache image screening results to avoid repeated API calls - 🎯 **Question Generation**: Generate 10 evaluation questions for each suitable image - 📊 **Multi-dimensional Evaluation**: The generated questions cover multiple dimensions such as object recognition, scene understanding, spatial relations, and reasoning - ⚡ **Multi-threaded Concurrency**: Support two types of thread pools (8 threads for judgment + 8 threads for generation) to greatly improve processing speed - 🎨 **Visual Interface**: Automatically generate an offline usable interactive HTML interface with support for search, filtering, and large-image viewing ## Project Structure VQA_Append_V2/ ├── main.py # Main program entry ├── config.py # Configuration Management ├── cache_manager.py # Cache Management Module ├── qwen_api.py # Qwen-VL-Plus API Call Wrapper ├── utils.py # Utility Functions ├── generate_visualization.py # Visualization File Generator ├── visualization.html # Visualization HTML Template ├── requirements.txt # Python Dependencies ├── env_template.txt # Environment Variable Template ├── .gitignore # Git Ignore Rules ├── README.md # Project Documentation ├── 使用指南.md # Chinese User Guide ├── 项目总览.md # Project Architecture Description ├── CHANGELOG.md # Version Update Log ├── 项目概述 # Project Requirement Description ├── output/ # Output Directory (Auto-generated) │ ├── *.json # VQA Data Files │ └── vqa_viewer.html # Visualization HTML File (Auto-generated) └── prompts/ # Prompt Templates ├── judge_suitability_prompt.txt ├── generate_questions_prompt.txt └── README.md ## Installation Steps ### 1. Clone or Download the Project bash cd VQA_Append_V2 ### 2. Install Python Dependencies bash pip install -r requirements.txt ### 3. Test API Connection (Recommended) Before formal use, it is recommended to test whether the API can connect normally: bash python test_api.py For detailed test instructions, please check `测试说明.md` ## Usage > 💡 **API Configuration is Built-in**: This project's API configuration is hard-coded, no additional configuration required, ready to use! ### Basic Usage bash python main.py --image_dir <image folder path> ### Complete Parameter Description bash python main.py --image_dir <image folder path> --output_dir <output directory> --cache_file <cache file path> --max_image_count <maximum number of images to process> --random_seed <random seed> **Parameter Explanation**: - `--image_dir`: (Required) Path to the folder containing image files - `--output_dir`: (Optional) Output directory for generated questions, default is `output` - `--cache_file`: (Optional) Cache file path, default is `image_cache.json` - `--max_image_count`: (Optional) Maximum number of images to process for testing, default is `None` (process all images) - `--random_seed`: (Optional) Random seed for reproducible random selection, default is `None` (use system time) ### Usage Examples bash # Example 1: Process images in the images folder python main.py --image_dir ./images # Example 2: Specify the output directory python main.py --image_dir ./images --output_dir ./results # Example 3: Specify the cache file python main.py --image_dir ./images --cache_file my_cache.json # Example 4: Test mode - process only the first 100 images (save API resources) python main.py --image_dir ./images --max_image_count 100 # Example 5: Use random seed for reproducible random selection python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # Example 6: Comprehensive use of multiple parameters python main.py --image_dir ./images --output_dir ./test_results --max_image_count 50 --random_seed 123 ## Workflow ### Multi-threaded Concurrent Processing **Phase 1 - Image Judgment (8-thread Concurrency)**: 1. Scan all image files in the specified folder (supports formats such as .jpg, .png, .bmp, .gif, .webp) 2. If `max_image_count` is specified, randomly select the specified number of images from all images 3. Use 8 threads to concurrently judge image suitability - First query the cache, skip API calls if hit - Call Qwen-VL-Plus model for judgment if cache miss - Save the judgment result to the cache file 4. Collect all suitable images **Phase 2 - Question Generation (8-thread Concurrency)**: 1. For the filtered suitable images, use 8 threads to concurrently generate VQA questions 2. Generate 10 multi-dimensional questions for each image 3. Save the generated questions in JSON format to the output directory **Phase 3 - Automatic Visualization Generation (Automatic)**: 1. Read all generated JSON files 2. Generate a standalone HTML visualization file (`vqa_viewer.html`) 3. All data is embedded in HTML, fully offline available **Performance Improvement**: Compared to serial processing, the multi-threaded version can achieve a speedup of more than 10 times! ## Output Content After the program runs, two types of files will be generated: ### 1. JSON Data Files (output directory) The questions generated for each image are saved as independent JSON files in the following format: json { "image_path": "/path/to/image.jpg", "image_name": "image.jpg", "generated_time": "2025-10-27T12:00:00", "total_questions": 10, "questions": [ { "question": "How many people are in the picture?", "answer": "3 people", "category": "Object Recognition and Counting" }, { "question": "Is the scene in the picture indoor or outdoor?", "answer": "Outdoor", "category": "Scene Understanding" } // ... More questions ] } ### 2. Visualization File (output directory) **output/vqa_viewer_[directory_name].html** - Standalone visualization HTML file - ✅ All VQA data is embedded automatically - ✅ Fully offline usable, no additional files required - ✅ Can be directly shared with others - ✅ Supports search, filtering, sorting and other functions - ✅ File name includes the output directory name for easy distinction between different batches of results **Naming Rules**: - Default output directory `output` → `vqa_viewer_output.html` - Custom output directory `output_3` → `vqa_viewer_output_3.html` - Custom output directory `results` → `vqa_viewer_results.html` #### Usage **Method 1: Double-click to open** bash Double-click the output/vqa_viewer_[directory_name].html file **Method 2: Open in browser** bash Open output/vqa_viewer_[directory_name].html in a browser #### Functional Features - 🔍 **Search Function** - Search for image names, questions or answers - 🏷️ **Category Filtering** - Filter by question category - 📊 **Multiple Sorting** - Sort by number of questions, time, etc. - 🖼️ **Image Viewing** - Click the card to view details, click the image to zoom in - 📱 **Responsive Design** - Supports desktop and mobile devices - 📑 **Paged Browsing** - Intelligent paged display #### Manually Generate Visualization If you need to generate the visualization file separately: bash # Basic usage (generate to output directory) python generate_visualization.py # Custom output directory and file name python generate_visualization.py --output_dir results --output_filename my_vqa.html # Generate HTML file with directory name python generate_visualization.py --output_dir output_3 --output_filename vqa_viewer_output_3.html **Parameter Explanation**: - `--output_dir`: Directory where JSON files are located (default: `output`) - `--html_template`: HTML template file (default: `visualization.html`) - `--output_filename`: Output file name (default: `vqa_viewer.html`, will be saved in output_dir) - `--quiet`: Quiet mode ## Cache Management The cache file (default `image_cache.json`) stores the suitability judgment result of each image in the following format: json { "/absolute/path/to/image1.jpg": true, "/absolute/path/to/image2.jpg": false } **Notes**: - The cache uses the absolute path of the image as the key to ensure uniqueness - If you need to re-judge an image, you can delete the corresponding entry from the cache file - If you need to clear all caches, simply delete the `image_cache.json` file ## Question Capability Dimensions The generated VQA questions cover the following capability dimensions: - 🔍 **Object Recognition and Counting**: Recognize objects in the image and their quantities - 🌄 **Scene Understanding**: Understand the overall scene and environment of the image - 📐 **Spatial Relations**: Judge the positional relationship between objects - 🎨 **Attribute Recognition**: Recognize attributes such as color, shape, size, etc. - 📝 **Text Recognition**: Recognize text content in the image (OCR) - 💡 **Reasoning and Common Sense**: Questions that require reasoning and common sense knowledge - 🔎 **Detail Observation**: Focus on detailed information in the image ## Supported Image Formats - JPEG (.jpg, .jpeg) - PNG (.png) - BMP (.bmp) - GIF (.gif) - WebP (.webp) ## Random Selection Function ### Random Selection Mechanism When using the `--max_image_count` parameter, the program will **randomly select** the specified number of images from all found images for processing, instead of intercepting in scan order. **Advantages**: - 🎲 **Randomness**: Avoid always processing the same first N images - 🔄 **Reproducibility**: Use the `--random_seed` parameter to ensure reproducible results - 📊 **Representativeness**: Random selection can better represent the entire image collection - 🧪 **Test-friendly**: Facilitate multiple tests and comparisons ### Usage bash # Randomly select 20 images (different results each time) python main.py --image_dir ./images --max_image_count 20 # Use fixed seed to ensure reproducible results python main.py --image_dir ./images --max_image_count 20 --random_seed 42 # Process all images (no random selection) python main.py --image_dir ./images **Random Seed Explanation**: - `--random_seed None` (default): Use system time as the seed, results will be different each run - `--random_seed 42`: Use fixed seed to ensure reproducible results - Same seed + same image collection = same selection result ## Performance Optimization ### Multi-thread Configuration The program uses a **two-phase thread pool architecture** by default, which greatly improves processing speed: - **Phase 1 (Image Judgment)**: 8 threads for concurrency - **Phase 2 (Question Generation)**: 8 threads for concurrency **Performance Comparison** (for 100 images): - Serial processing: ~22 minutes - Parallel processing: ~2 minutes - **Speedup of 11 times!** ### Customize Thread Count Adjust in `config.py`: python # Multi-thread Configuration judgment_workers: int = 8 # Number of threads for image judgment generation_workers: int = 8 # Number of threads for question generation **Recommended Configuration**: - Small batch (<100 images): 4 threads + 4 threads - Medium batch (100-500 images): 8 threads + 8 threads - Large batch (>500 images): 16 threads + 8 threads ### Prompt Customization All prompt words are stored in the `prompts/` folder: - `judge_suitability_prompt.txt` - Image judgment prompt - `generate_questions_prompt.txt` - Question generation prompt **Modification Method**: Directly edit the file and rerun the program to take effect without modifying the code. ## Notes 1. **API Configuration**: The API configuration is hard-coded in `config.py`, edit this file if you need to modify it 2. **Network Connection**: A stable network connection is required to access Alibaba Cloud DashScope API (turn off the proxy) 3. **API Rate Limiting**: Pay attention to the API call frequency limit, reduce the number of threads if you encounter rate limiting 4. **Image Quality**: Ensure that the images are clear and rich in content to obtain better question quality 5. **Caching Mechanism**: Make good use of the caching function to avoid repeated processing of the same image 6. **Thread Safety**: The program has implemented thread-safe mechanisms, and multi-threading can be used with confidence 7. **Test Mode**: It is recommended to use the `--max_image_count` parameter to limit the number of processed images for the first run, verify the effect before full-scale processing ## Troubleshooting ### Problem 1: API Call Failure **Possible Reasons**: - Network connection issues - Incorrect API configuration - Exceeded API call frequency limit **Solutions**: - Check the network connection - Verify whether the API configuration in `config.py` is correct - Retry after waiting for a period of time ### Problem 2: No Image Files Found **Solutions**: - Check whether the path specified by the `--image_dir` parameter is correct - Confirm that the folder contains supported image formats ## Technical Features ### Implemented Features - ✅ Multi-threaded concurrent processing (8 threads for judgment + 8 threads for generation) - ✅ Thread-safe resource access (lock mechanism protection) - ✅ Prompt file-based management (hot reloadable) - ✅ Intelligent caching mechanism (avoid repeated API calls) - ✅ Real-time progress tracking (two-phase progress display) - ✅ Automatic retry mechanism (up to 3 times) - ✅ Interactive visualization (standalone HTML file, fully offline) ### Future Plans - [ ] Dynamic thread count adjustment (automatically optimize based on API response) - [ ] Breakpoint continuation function (support continuation after interruption) - [ ] Visual progress bar (using tqdm) - [ ] Support for more multimodal models - [ ] Export PDF/Excel reports ## License This project is for learning and research purposes only. ## Contact If you have any questions or suggestions, please submit an Issue or Pull Request.
提供机构:
maas
创建时间:
2025-10-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作