ComfyUI WAN 2.1 Depth Lora Workflow

Let’s be real—getting consistent motion in AI video has always been hit or miss. I’ve messed with AnimateDiff and other tools before, but the flickering drove me nuts. So when I heard about WAN 2.1’s depth control, I figured it was worth a shot.

Turns out, combining depth maps with LoRAs actually works better than I expected. No fancy hardware needed—just ComfyUI and a couple of safetensors files. Here’s how it went.

Grabbing the Right Models

First, you’ll need two Control LoRAs:
✅ Control LoRA for Depth Maps (Downloadable safetensors )
✅ Control LoRA for Tile (Downloadable safetensors )
one for depth maps and another for tile refinement. I grabbed both from the usual spots—Hugging Face and CivitAI—but the WAN team also has pre-configured workflows if you want to skip the manual setup.

Depth Maps as Motion Guides

I started with a reference video of a dancer and ran it through Depth Anything V2 to extract the depth map. If you’ve never used depth maps before, they’re just grayscale images where lighter areas are “closer” and darker areas are “farther” in 3D space. The Control LoRA uses this to replicate the original motion, which is way smoother than trying to describe movement in a text prompt.

For styling, I threw in a basic prompt like “cyberpunk dancer under neon lights” and let the model handle the rest. If you want tighter control, Florence 2 can auto-generate prompts from a reference image—handy for matching specific aesthetics.

Running the Workflow

I loaded the 1.3B WAN 2.1 model (the smaller one, since it’s faster) and connected the depth map to the Control LoRA node. The dual-sampler setup is key here: the first pass sketches out the motion in 10 steps, and the second pass refines details in another 10. On my 4090, a 3-second clip took under 2 minutes to render, with the dancer’s moves intact but now in a cyberpunk outfit.

Here’s the thing—it’s not perfect. Some frames still get jittery, especially with fast motion. But compared to raw AnimateDiff outputs, the depth control makes a noticeable difference.

Upscaling and Tile LoRA

To clean up artifacts, I upscaled the output 1.5x and ran it through the Tile Control LoRA. This sharpens edges and reduces noise without over-smoothing. You can also use Segment Anything to tweak specific elements (like changing just the dancer’s clothes), but I’ll save that for another test.

Download Workflows

🤖 Hey, welcome! Thanks for visiting comfyuiblog.com