So I was messing around with Alibaba’s Wan 2.1 InPaint in ComfyUI last week, and honestly? It’s wild how well it handles those start-to-end frame animations. Like, you give it two images—say, a car from the front and back—and it just figures out the motion in between. No fancy rigging, no manual tweaking.
I ran this on my kinda-old GTX 1650 (only 4GB VRAM, lol) and it actually worked without melting my PC. Here’s how I got it running.
Getting Wan 2.1 InPaint Set Up
First, I grabbed the Wan 2.1 InPaint model from their repo (here’s the link) and dumped it into
ComfyUI/models/diffusion
Then I just dragged in the “Juan Fun InPaint 2 Video” node—super straightforward.
If you’re on a weaker GPU like me, skip the 14B version and go straight for the 1.3B model. Also, keep the res at 512×512 or lower unless you’ve got a beefy setup. I even added --lowvram
to my ComfyUI startup, and it ran without crashing.
Running the First Test
I hooked up my start and end frames to the CLIP Vision Encode node and set CFG to 0 first—just to check the raw motion before committing to a full render. Once that looked decent, I bumped it to CFG 4-5 for the final pass.
Pro tip: Keep your start/end frames similar (same lighting, same subject) unless you want some weird morphing artifacts. Learned that the hard way.
Fixing the Usual Problems
- Blurry output? Upscale with Tile LoRA after generating.
- Warped faces? Either drop the CFG or add a midpoint frame to guide it.
- VRAM crashes?
--medvram
helps, or just chop your video into shorter clips.
For anyone trying this, the Wan2.1-Fun-1.3B-InP model’s the way to go if you’re not running a 4090 or something.
Anyway, that’s it. No fancy steps, no over-explaining—just how I got it working. If you’ve tried this with different settings, hit me up. Curious if anyone’s pushed it further.
Free Resources