If you’ve been working with SD-scripts and your GPU feels like it’s carrying the weight of the world, you’re not alone! Dealing with VRAM consumption can be a headache, especially when you’re trying to squeeze every bit of performance out of your setup. Luckily, there are some handy tricks you can use to reduce VRAM usage without sacrificing too much speed or quality. Let’s break them down, step by step, in easy English.
Learning Model Size Matters
The size of your learning model can have a huge impact on how much VRAM you use. Here’s the rough order of model sizes:
- FLUX.1 > SDXL > SD1
For example, SDXL uses a large transformer layer, which eats up about 80% of the memory. On the other hand, models like FLUX.1 can be even more demanding. So, picking a smaller model can save you a lot of memory.
Learning Format: Less is More
Fine-tuning your model tends to use more VRAM compared to simpler methods like LoRA. Here’s the general order from the most VRAM-hungry to the least:
- Fine tuning ≥ Original Weight Method
- Native Fine Tuning > BOFT > OFT > LoCon (C3Lier, LoRA+), LoKr
If you’re not dead set on fine-tuning, consider a lighter method like LoRA. It’s less memory-intensive but still gets the job done.
Optimizers and Data Types
Optimizers help speed up training but can also use more VRAM. Here’s a quick VRAM usage breakdown for optimizers and data types:
- Data types: FP32 > FP16/BF16 > FP8
- Optimizers: Prodigy > Lion > AdamW > AdaFactor
For lower VRAM usage, stick to FP16 or BF16, and choose an optimizer like AdaFactor. These use less memory than FP32 and other heavy optimizers.
Enable Gradient Checkpointing
This is a simple trick to dramatically cut down on VRAM use. When you enable gradient_checkpointing, it skips some steps in the backward pass, which reduces the memory load. The cool thing is, it doesn’t affect accuracy. However, it might slow down the training process a bit, but because you can increase batch size, the total time could actually be faster. You can enable it by adding this line in your config:
--gradient_checkpointing
For those using derrien-distro, the batch size can be changed in the settings folder.
Simplify the Optimizer’s Backward Pass (Experimental)
This option is still in development, but it’s another way to reduce VRAM consumption by simplifying the backward calculations of your optimizer. It only works with fine-tuning (not LoRA), and depending on your method, it might save you some VRAM.
Try the Fused Optimizer (Experimental)
In the dev branch of sd-scripts, the fused optimizer is another experimental feature. This optimizer can skip the backward pass completely, saving a chunk of VRAM, especially with SDXL. However, this option is a bit slower, and it only works with AdaFactor.
Here’s how to set it up:
--gradient_checkpointing
--optimizer_type="AdaFactor"
--fused_backward_pass
Group Parameters (Experimental)
This is another experimental option that groups parameters together, reducing the load on your VRAM. It works with fine-tuning and optimizers like PagedLion8bit but isn’t compatible with LoRA or optimizers that use gradient accumulation. To use this feature, here’s an example config:
--gradient_checkpointing
--optimizer_type="PagedLion8bit"
--fused_optimizer_groups 10
Adjust Basic Settings
If you’re still hitting memory limits, adjusting your basic settings can help. Here are some ideas:
- Reduce train_batch_size to 1. This cuts parallel calculations and VRAM load.
- Lower network dimensions like
dim
orconv dim
. - Use lightweight models with fp16 or even fp8.
- Enable xformers to optimize memory use.
For example, here’s a possible setup for a lightweight environment:
mixed_precision="fp16"
required: fp16 compatible graphics card, specify fp16 in accelerate config
Memory Fallback for NVIDIA Cards
If nothing else works, you can use NVIDIA’s memory fallback feature. This allows your system to use main memory when VRAM runs out. It’s slower, but if you’re on a low VRAM card, it might be your best option.
Here’s how to set it up:
- Open the NVIDIA Control Panel.
- Go to 3D Settings → Manage 3D Settings.
- Change the CUDA System Memory Fallback to Driver default or Prefer system memory fallback.
This will let your system use both VRAM and your main memory, though it will be slower, especially with larger batches.
Reducing VRAM consumption in SD-scripts doesn’t have to be a mystery. With these steps, you can make sure your GPU doesn’t break a sweat while still achieving solid results. Give some of these methods a try, and see how much smoother your workflow becomes!