Hi there! Today, I’m excited to walk you through setting up Wan 2.1, the latest video generation model by Ali Parba, using ComfyUI. The best part? It’s free, works on any PC, and even handles 4K upscaling! Whether your system has 8GB RAM or 16GB+ VRAM, I’ll help you pick the right model, optimize settings, and create smooth, high-quality videos. Let’s get started!
Understanding W 2.1 Models: Which One Should You Pick?
The model names might look confusing at first, but here’s a quick breakdown:
- Example:
wan2.1-i2v-14b-480p-q2_k.gguf
- i2v: Image-to-video model.
- 14b: 14 billion parameters (higher = better quality but needs more resources).
- 480p/720p: Output resolution.
- q2_k/q5_1: Quantization level (affects file size and performance).
What’s Quantization?
Quantization compresses the model for easier use on systems with limited RAM or VRAM. Here’s what to know:
- Lower Quantization (q2_k, q3_k_m): Smaller (7-9GB), faster, ideal for 8-12GB RAM. Quality might dip slightly.
- Higher Quantization (q5_1, q6_k): Larger (12-18GB), slower, best for 16GB+ VRAM. Near-original quality.
My Recommendations:
- Low RAM (6-12GB): Start with
wan2.1-i2v-14b-480p-q4_0.gguf
. Balances speed and quality. - High VRAM (16GB+): Use
wan2.1-i2v-14b-720p-q5_1.gguf
for crisp 720p output.
Step-by-Step Setup Guide
- Download Models:
Visit the Hugging Face page and grab: https://huggingface.co/calcuis/wan-gguf/tree/main- The GGUF model (e.g.,
wan2.1-i2v-14b-480p-q4_0.gguf
). - Text encoder (
T5XXL-UM
), VAE, and CLIP Vision encoder.
- The GGUF model (e.g.,
- Organize Files:
- Place the GGUF model in
./ComfyUI/models/diffusion_models
. - Move the text encoder to
./ComfyUI/models/text_encoders
. - Add the VAE to
./ComfyUI/models/vae
. - Drop the CLIP Vision encoder into
./ComfyUI/models/clip_vision
.
- Place the GGUF model in
- Run ComfyUI:
Launch the.bat
file from the ComfyUI directory. If nodes are missing, install them via the ComfyUI Manager. - Load the Workflow:
Download the workflow from comfyuiblog.com and drag it into ComfyUI.
Configuring Nodes for Optimal Results
- CLIP Loader: Select the
t5xxl_um_fp8_e4m3fn_scaled.safetensors
text encoder. - VAE Loader: Use the GGUF loader for the VAE.
- WAN Image-to-Video Node: Set resolution to 848×480 (480p) or 1280×720 (720p).
Enhance Your Videos: Smoother Playback & 4K Upscaling
1. Use RIFE VFI for Smoother Motion
- Interpolation Model: Pick
rife47.pth
. - Factor: Set to 2 for doubling frame rate (e.g., 16 FPS → 32 FPS).
- UHD Mode: Enable for 720p/1080p videos.
2. Upscale to 4K
- Upscale Model: Try
4x_foolhardy_Remacri.pth
. - Scale Factor: 4x for 4K.
- Adjust Tile Size: Smaller tiles (e.g., 256) speed up processing.
Sampler Settings: Balance Speed and Quality
- Sampler Type:
- UniPC: Fastest, decent quality.
- Euler A: Balanced.
- DDIM: Slower but sharper details.
- Steps: 20-50 (lower = faster, higher = better detail).
- CFG Scale: 7-12 (lower = creative, higher = precise).
- Seed: Random for unique outputs; fixed to recreate results.
Generate Your First Video!
- Upload Image: Use the LoadImage node. Resize if needed.
- Set Prompts:
- Positive: Describe the scene (e.g., “Woman in black coat removes her coat”).
- Negative: Exclude flaws (e.g., “deformed, blurry”).
- Click Queue and wait. Your video saves automatically in the output folder!
Final Tips
- Start with shorter clips (5-10 seconds) to test settings.
- Experiment with prompts and CFG values for varied styles.
- Join forums like Reddit’s r/StableDiffusion for community support.