After months of using Midjourney through Discord, I decided to run Stable Diffusion on my own machine. The appeal is obvious: no subscription fees, no content restrictions, unlimited generations, and full control over models and workflows. Getting it working was more involved than I expected, but the result was worth it.

My Hardware

I'm running an NVIDIA RTX 3070 with 8GB VRAM, 32GB system RAM, and an AMD Ryzen 7 5800X. This is a mid-range setup by today's standards. Not the best, not the worst. I wanted to see what a "normal" developer's machine could handle.

The key number is VRAM. Stable Diffusion lives and dies by your GPU memory. 8GB is the practical minimum for comfortable use. With 6GB you can make it work but you'll be limited to smaller resolutions and fewer options. 12GB or more gives you breathing room for larger images and more complex workflows.

Why ComfyUI

There are two main interfaces for local Stable Diffusion: Automatic1111's Web UI and ComfyUI. I tried both and settled on ComfyUI. Here's why.

Automatic1111 is the easier starting point. It's a traditional web form: pick a model, type a prompt, adjust some sliders, click generate. Simple and effective. But it's a black box. You don't really understand what's happening under the hood.

ComfyUI is a node-based workflow editor. You connect nodes together to build your generation pipeline visually. It looks intimidating at first, like staring at a circuit diagram. But once you understand the basic flow (load model, encode prompt, sample, decode, save), it clicks. And then you can build workflows that Automatic1111 simply can't do without custom scripts.

The learning curve took me about 3-4 hours to get comfortable. Now I can build custom workflows for things like img2img with ControlNet, inpainting with masks, batch generation with prompt variations, and multi-model pipelines.

Model Choices

The base Stable Diffusion models are fine but the community fine-tunes are where the real quality lives. I'm currently using:

  • SDXL 1.0 for general purpose high-quality generation. It's the newest base model and the quality jump over SD 1.5 is significant.
  • Realistic Vision for photorealistic images. Portraits, landscapes, product shots. Impressively close to real photos.
  • DreamShaper for illustration and concept art. Great stylized output.

Models are big. SDXL is about 6.5GB. The SD 1.5 fine-tunes are around 2-4GB each. Plan for at least 30-50GB of disk space if you want to keep a few models around.

The Setup Process

Install Python 3.10 (not 3.11, I hit compatibility issues). Clone the ComfyUI repository. Install the requirements. Download a model from CivitAI or Hugging Face. Drop it into the models/checkpoints folder. Run the server. Open the browser.

It sounds simple. It mostly is. The gotchas: make sure you have the CUDA toolkit installed for NVIDIA GPUs. Make sure your Python environment doesn't conflict with other projects (use a venv). And make sure you're downloading the right model format. Safetensors is the modern standard, avoid pickle files for security reasons.

From zero to first generated image took me about 45 minutes, including model download time.

Performance on My Setup

With SDXL on my RTX 3070 (8GB VRAM):

  • 1024x1024 image: about 25-30 seconds at 20 sampling steps
  • 512x512 image: about 8-10 seconds
  • Batch of 4 at 512x512: about 35-40 seconds

With SD 1.5 models, everything is roughly twice as fast. A 512x512 image in about 4-5 seconds.

VRAM usage sits at about 6-7GB during SDXL generation. I can't run SDXL at resolutions much above 1024x1024 without running out of memory. With SD 1.5 models, I have more headroom.

Midjourney vs Local: The Trade-off

Midjourney still produces better "out of the box" results with simple prompts. Its aesthetic defaults are gorgeous and it requires less prompt engineering. If you just want beautiful images with minimal effort, Midjourney is still the better choice.

Local Stable Diffusion wins on control, customization, and cost. Once the hardware investment is made, every generation is free. You can use ControlNet for precise composition, train LoRA models on your own data, run inpainting for targeted edits, and build complex automated workflows.

I'm keeping both. Midjourney for quick creative needs. Local SD for projects where I need fine control or high volume.

Tips for Getting Started

Start with Automatic1111 if ComfyUI looks overwhelming. You can always switch later. Download one good model to start, don't hoard models before you know what you need. Join the Stable Diffusion subreddit and the ComfyUI Discord for workflow examples. And most importantly, just start generating. You'll learn more from 100 generations than from reading 100 guides.