I’ve been testing Alibaba’s Wan 2.1 Fun ControlNet in ComfyUI for the past week, and let’s be real—it’s one of the more interesting video models I’ve tried lately. Unlike some other tools that struggle with flickering or inconsistent details, this one actually keeps styles coherent across frames. Characters and backgrounds stay uniform, even with camera moves, which is a big step up from what I’ve seen before.
Here’s the thing: it’s not magic. There are still quirks, and longer clips need some extra work. But for short animations, music videos, or even social media ads, it’s surprisingly solid.
What You’ll Need
First, make sure your ComfyUI is updated. I ran into a few issues at first because I was on an older version, and some of the nodes didn’t play nice. After updating, things smoothed out.
You’ll also need the Wan 2.1 Fun ControlNet model. I grabbed mine from Hugging Face—just download it and drop it into
ComfyUI/models/diffusion/
Renaming it something clear (like Wan_2.1_Fun_Control_1.3B.safetensors
) saves trouble later.
If you’re new to ComfyUI, the Manager plugin is a lifesaver for handling dependencies. I didn’t have it installed at first, and yeah, that was a mistake.
Setting Up the Workflow
I started with a short reference video—about 5 seconds—just to see how the model handled motion. The Load Video node pulled it in, and I set the dimensions to 832×480, which seems to be the sweet spot for Wan 2.1.
From there, I extracted the first frame using Get Image from Batch. This became my style reference. I ran it through Flux Turbo and a LoRA to tweak the look (I was going for an anime style, but you could do anything). The key was keeping the pose intact while adjusting colors and textures.
Finally, I connected everything to the Wan Fun Control to Video node. For settings, I stuck with:
- CFG Scale: 7–9
- Sampler: Euler a or DPM++ 2M
- Frames per Batch: 8 (this kept things stable on my 8GB VRAM setup)
What Worked (and What Didn’t)
The biggest win here is consistency. No more weird flickering or sudden style shifts between frames. Motion stays smooth, and the ControlNet integration actually does what it’s supposed to—no janky surprises.
That said, longer clips still need segmentation. I tried a 15-second video, and the model started struggling around the 10-second mark. Breaking it into smaller chunks fixed the issue, but it’s something to keep in mind.
Final Thoughts
Wan 2.1 Fun ControlNet isn’t perfect, but it’s a legit step forward for AI video in ComfyUI. If you’re working on short, high-quality animations, it’s worth a shot. For more details, check out the Hugging Face page