Kain1999/stable-diffusion-webui

Name: Kain1999/stable-diffusion-webui
Creator: Kain1999
Published: 2023-07-16 06:01:16
License: 暂无描述

Hugging Face2023-07-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Kain1999/stable-diffusion-webui

下载链接

链接失效反馈

官方服务：

资源简介：

# Stable Diffusion web UI A browser interface based on Gradio library for Stable Diffusion. ![](txt2img_Screenshot.png) Check the [custom scripts](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts) wiki page for extra scripts developed by users. ## Features [Detailed feature showcase with images](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features): - Original txt2img and img2img modes - One click install and run script (but you still must install python and git) - Outpainting - Inpainting - Color Sketch - Prompt Matrix - Stable Diffusion Upscale - Attention, specify parts of text that the model should pay more attention to - a man in a ((tuxedo)) - will pay more attention to tuxedo - a man in a (tuxedo:1.21) - alternative syntax - select text and press ctrl+up or ctrl+down to automatically adjust attention to selected text (code contributed by anonymous user) - Loopback, run img2img processing multiple times - X/Y plot, a way to draw a 2 dimensional plot of images with different parameters - Textual Inversion - have as many embeddings as you want and use any names you like for them - use multiple embeddings with different numbers of vectors per token - works with half precision floating point numbers - train embeddings on 8GB (also reports of 6GB working) - Extras tab with: - GFPGAN, neural network that fixes faces - CodeFormer, face restoration tool as an alternative to GFPGAN - RealESRGAN, neural network upscaler - ESRGAN, neural network upscaler with a lot of third party models - SwinIR and Swin2SR([see here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/2092)), neural network upscalers - LDSR, Latent diffusion super resolution upscaling - Resizing aspect ratio options - Sampling method selection - Adjust sampler eta values (noise multiplier) - More advanced noise setting options - Interrupt processing at any time - 4GB video card support (also reports of 2GB working) - Correct seeds for batches - Live prompt token length validation - Generation parameters - parameters you used to generate images are saved with that image - in PNG chunks for PNG, in EXIF for JPEG - can drag the image to PNG info tab to restore generation parameters and automatically copy them into UI - can be disabled in settings - drag and drop an image/text-parameters to promptbox - Read Generation Parameters Button, loads parameters in promptbox to UI - Settings page - Running arbitrary python code from UI (must run with --allow-code to enable) - Mouseover hints for most UI elements - Possible to change defaults/mix/max/step values for UI elements via text config - Random artist button - Tiling support, a checkbox to create images that can be tiled like textures - Progress bar and live image generation preview - Negative prompt, an extra text field that allows you to list what you don't want to see in generated image - Styles, a way to save part of prompt and easily apply them via dropdown later - Variations, a way to generate same image but with tiny differences - Seed resizing, a way to generate same image but at slightly different resolution - CLIP interrogator, a button that tries to guess prompt from an image - Prompt Editing, a way to change prompt mid-generation, say to start making a watermelon and switch to anime girl midway - Batch Processing, process a group of files using img2img - Img2img Alternative, reverse Euler method of cross attention control - Highres Fix, a convenience option to produce high resolution pictures in one click without usual distortions - Reloading checkpoints on the fly - Checkpoint Merger, a tab that allows you to merge up to 3 checkpoints into one - [Custom scripts](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Custom-Scripts) with many extensions from community - [Composable-Diffusion](https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/), a way to use multiple prompts at once - separate prompts using uppercase `AND` - also supports weights for prompts: `a cat :1.2 AND a dog AND a penguin :2.2` - No token limit for prompts (original stable diffusion lets you use up to 75 tokens) - DeepDanbooru integration, creates danbooru style tags for anime prompts - [xformers](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Xformers), major speed increase for select cards: (add --xformers to commandline args) - via extension: [History tab](https://github.com/yfszzx/stable-diffusion-webui-images-browser): view, direct and delete images conveniently within the UI - Generate forever option - Training tab - hypernetworks and embeddings options - Preprocessing images: cropping, mirroring, autotagging using BLIP or deepdanbooru (for anime) - Clip skip - Use Hypernetworks - Use VAEs - Estimated completion time in progress bar - API - Support for dedicated [inpainting model](https://github.com/runwayml/stable-diffusion#inpainting-with-stable-diffusion) by RunwayML. - via extension: [Aesthetic Gradients](https://github.com/AUTOMATIC1111/stable-diffusion-webui-aesthetic-gradients), a way to generate images with a specific aesthetic by using clip images embds (implementation of [https://github.com/vicgalle/stable-diffusion-aesthetic-gradients](https://github.com/vicgalle/stable-diffusion-aesthetic-gradients)) - [Stable Diffusion 2.0](https://github.com/Stability-AI/stablediffusion) support - see [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#stable-diffusion-20) for instructions ## Installation and Running Make sure the required [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) are met and follow the instructions available for both [NVidia](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-NVidia-GPUs) (recommended) and [AMD](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs) GPUs. Alternatively, use online services (like Google Colab): - [List of Online Services](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Online-Services) ### Automatic Installation on Windows 1. Install [Python 3.10.6](https://www.python.org/downloads/windows/), checking "Add Python to PATH" 2. Install [git](https://git-scm.com/download/win). 3. Download the stable-diffusion-webui repository, for example by running `git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git`. 4. Place `model.ckpt` in the `models` directory (see [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) for where to get it). 5. _*(Optional)*_ Place `GFPGANv1.4.pth` in the base directory, alongside `webui.py` (see [dependencies](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Dependencies) for where to get it). 6. Run `webui-user.bat` from Windows Explorer as normal, non-administrator, user. ### Automatic Installation on Linux 1. Install the dependencies: ```bash # Debian-based: sudo apt install wget git python3 python3-venv # Red Hat-based: sudo dnf install wget git python3 # Arch-based: sudo pacman -S wget git python3 ``` 2. To install in `/home/$(whoami)/stable-diffusion-webui/`, run: ```bash bash <(wget -qO- https://raw.githubusercontent.com/AUTOMATIC1111/stable-diffusion-webui/master/webui.sh) ``` ### Installation on Apple Silicon Find the instructions [here](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Installation-on-Apple-Silicon). ## Contributing Here's how to add code to this repo: [Contributing](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Contributing) ## Documentation The documentation was moved from this README over to the project's [wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki). ## Credits - Stable Diffusion - https://github.com/CompVis/stable-diffusion, https://github.com/CompVis/taming-transformers - k-diffusion - https://github.com/crowsonkb/k-diffusion.git - GFPGAN - https://github.com/TencentARC/GFPGAN.git - CodeFormer - https://github.com/sczhou/CodeFormer - ESRGAN - https://github.com/xinntao/ESRGAN - SwinIR - https://github.com/JingyunLiang/SwinIR - Swin2SR - https://github.com/mv-lab/swin2sr - LDSR - https://github.com/Hafiidz/latent-diffusion - MiDaS - https://github.com/isl-org/MiDaS - Ideas for optimizations - https://github.com/basujindal/stable-diffusion - Cross Attention layer optimization - Doggettx - https://github.com/Doggettx/stable-diffusion, original idea for prompt editing. - Cross Attention layer optimization - InvokeAI, lstein - https://github.com/invoke-ai/InvokeAI (originally http://github.com/lstein/stable-diffusion) - Textual Inversion - Rinon Gal - https://github.com/rinongal/textual_inversion (we're not using his code, but we are using his ideas). - Idea for SD upscale - https://github.com/jquesnelle/txt2imghd - Noise generation for outpainting mk2 - https://github.com/parlance-zz/g-diffuser-bot - CLIP interrogator idea and borrowing some code - https://github.com/pharmapsychotic/clip-interrogator - Idea for Composable Diffusion - https://github.com/energy-based-model/Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch - xformers - https://github.com/facebookresearch/xformers - DeepDanbooru - interrogator for anime diffusers https://github.com/KichangKim/DeepDanbooru - Security advice - RyotaK - Initial Gradio script - posted on 4chan by an Anonymous user. Thank you Anonymous user. - (You)

提供机构：

Kain1999

原始信息汇总

数据集功能概述

核心功能

txt2img和img2img模式：原始的文本到图像和图像到图像转换模式。
一键安装和运行：简化安装和启动过程，但需先安装Python和Git。
图像处理技术：包括Outpainting、Inpainting、Color Sketch、Prompt Matrix、Stable Diffusion Upscale等。
文本注意力机制：允许用户指定模型应重点关注文本中的特定部分。
循环处理：通过Loopback功能多次运行img2img处理。
X/Y图：用于绘制具有不同参数的二维图像图。
文本反转：支持多嵌入，使用半精度浮点数，可在8GB内存上训练。

扩展功能

Extras标签：包含GFPGAN、CodeFormer、RealESRGAN等多种神经网络工具。
调整选项：包括调整采样方法、中断处理、支持低至4GB的视频卡等。
生成参数保存：保存生成图像的参数，支持拖放恢复参数。
实时预览和进度条：提供图像生成过程的实时预览和进度显示。
负向提示：允许用户指定不希望在生成图像中出现的内容。
风格和变体：保存并应用提示的一部分，生成相似但有细微差别的图像。
CLIP审问器：尝试从图像中猜测提示。
批处理和图像到图像的替代方法：处理多个文件，使用反向Euler方法控制交叉注意力。

高级功能

Composable Diffusion：支持同时使用多个提示，支持提示权重。
无提示令牌限制：相比原始的Stable Diffusion，无75个令牌的限制。
DeepDanbooru集成：为动漫提示创建Danbooru风格的标签。
xformers支持：提高特定卡片的处理速度。
训练选项：包括超网络和嵌入选项，以及图像预处理功能。
API支持：提供API接口。

模型和集成

GFPGAN：用于面部修复的神经网络。
CodeFormer：面部修复工具，作为GFPGAN的替代品。
RealESRGAN和ESRGAN：用于图像上采样的神经网络。
SwinIR和Swin2SR：神经网络上采样器。
LDSR：潜在扩散超分辨率上采样。
Stable Diffusion 2.0支持：提供更高级的图像生成功能。

安装和运行

依赖检查：确保满足所有依赖项。
安装指南：提供针对不同GPU（如NVidia和AMD）的详细安装和运行指南。
自动安装脚本：为Windows和Linux用户提供一键安装脚本。
在线服务：推荐使用Google Colab等在线服务进行操作。

搜集汇总

数据集介绍

构建方式

在生成式人工智能领域，Stable Diffusion web UI 数据集的构建依托于开源社区协作模式，以 Gradio 库为基础框架，整合了 Stable Diffusion 模型的核心功能。其构建过程遵循模块化设计原则，通过集成多种预训练模型如 GFPGAN、CodeFormer 和 RealESRGAN，并融合用户贡献的自定义脚本，形成了一套可扩展的交互式系统。该数据集通过自动化安装脚本支持跨平台部署，涵盖 Windows、Linux 和 Apple Silicon 架构，确保了技术栈的广泛兼容性与可重复性。

特点

该数据集在文本到图像生成领域展现出多维度特性，不仅涵盖基础的 txt2img 与 img2img 模式，还集成了高级功能如修复、外绘与提示矩阵。其支持无标记限制的提示词输入，结合注意力调控与组合扩散技术，实现了对生成内容的精细控制。此外，数据集整合了面部修复、超分辨率增强等后处理工具，并兼容多种采样方法与模型变体，如 Hypernetworks 和 VAE，为艺术创作与学术研究提供了高度灵活的实验环境。

使用方法

用户可通过本地部署或在线服务访问该数据集，首先需配置 Python 环境并获取预训练模型文件。在启动 Web UI 后，界面提供直观的参数面板，支持实时调整生成尺寸、采样器与种子值。用户可利用提示词编辑、负面提示与风格预设功能优化输出，并通过扩展脚本集成社区开发的定制化工具。对于高级应用，API 接口与训练选项卡允许用户进行模型微调与嵌入学习，实现从基础生成到专业工作流的无缝衔接。

背景与挑战

背景概述

Stable Diffusion web UI 是由 AUTOMATIC1111 开发的开源项目，旨在为 Stable Diffusion 模型提供基于 Gradio 的浏览器界面。该项目于 2022 年推出，迅速成为生成式人工智能领域的重要工具，极大地降低了用户使用扩散模型进行图像生成的门槛。其核心研究问题在于如何将复杂的文本到图像生成模型转化为直观、可交互的图形界面，从而促进创意表达与艺术创作。该工具集成了多种先进功能，如文本反转、超网络训练、图像修复与放大等，不仅推动了 Stable Diffusion 模型的普及，也为社区贡献了丰富的扩展生态，对数字艺术与人工智能交叉领域产生了深远影响。

当前挑战

该数据集所解决的领域问题在于文本到图像生成的可访问性与可控性挑战。传统扩散模型通常需要专业编程知识才能操作，而 Stable Diffusion web UI 通过图形界面简化了这一过程，但随之而来的是界面复杂性与用户学习曲线的平衡难题。构建过程中面临的挑战包括：如何整合多样化的社区扩展脚本，确保功能模块的兼容性与稳定性；如何优化资源消耗，以适应不同硬件配置（如低至 2GB 显存的显卡）；以及如何维护一个开源项目的安全性，防止恶意代码执行风险。此外，随着模型版本的迭代（如 Stable Diffusion 2.0），保持界面与底层算法的同步更新也是一项持续的技术挑战。

常用场景

经典使用场景

在生成式人工智能领域，Stable Diffusion web UI 作为一款基于 Gradio 的浏览器界面工具，其经典使用场景主要围绕文本到图像生成与图像到图像转换展开。通过整合多种高级功能，如外绘、内绘、颜色草图及提示矩阵等，该工具使得研究人员和创作者能够直观地探索潜在扩散模型的生成能力，实现从抽象概念到视觉艺术作品的流畅转化。

衍生相关工作

围绕该数据集衍生的经典工作包括文本反转技术的集成、可组合扩散模型的实现，以及基于 xformers 的加速优化。这些扩展不仅丰富了生成模型的功能性，如支持多提示组合与注意力调控，还催生了社区驱动的自定义脚本生态，为稳定扩散模型的后续改进与应用创新提供了坚实基础。

数据集最近研究