Alibaba’s Wan 2.1 InPaint Start/End Frame Workflow

So I was messing around with Alibaba’s Wan 2.1 InPaint in ComfyUI last week, and honestly? It’s wild how well it handles those start-to-end frame animations. Like, you give it two images—say, a car from the front and back—and it just figures out the motion in between. No fancy rigging, no manual tweaking.

I ran this on my kinda-old GTX 1650 (only 4GB VRAM, lol) and it actually worked without melting my PC. Here’s how I got it running.

Getting Wan 2.1 InPaint Set Up

First, I grabbed the Wan 2.1 InPaint model from their repo (here’s the link) and dumped it into

ComfyUI/models/diffusion

Then I just dragged in the “Juan Fun InPaint 2 Video” node—super straightforward.

If you’re on a weaker GPU like me, skip the 14B version and go straight for the 1.3B model. Also, keep the res at 512×512 or lower unless you’ve got a beefy setup. I even added --lowvram to my ComfyUI startup, and it ran without crashing.

Running the First Test

I hooked up my start and end frames to the CLIP Vision Encode node and set CFG to 0 first—just to check the raw motion before committing to a full render. Once that looked decent, I bumped it to CFG 4-5 for the final pass.

Pro tip: Keep your start/end frames similar (same lighting, same subject) unless you want some weird morphing artifacts. Learned that the hard way.

Fixing the Usual Problems

Blurry output? Upscale with Tile LoRA after generating.
Warped faces? Either drop the CFG or add a midpoint frame to guide it.
VRAM crashes? --medvram helps, or just chop your video into shorter clips.

For anyone trying this, the Wan2.1-Fun-1.3B-InP model’s the way to go if you’re not running a 4090 or something.

Anyway, that’s it. No fancy steps, no over-explaining—just how I got it working. If you’ve tried this with different settings, hit me up. Curious if anyone’s pushed it further.

Free Resources

Wan2.1-Fun-1.3B-InP

Download Workflows

🤖 Hey, welcome! Thanks for visiting comfyuiblog.com