The WAN 2.1 release brought two interesting video models—CosVid and MovieGen—each taking a completely different approach to AI video generation. I spent the last week running both through ComfyUI to see how they stack up for different use cases. Here’s what actually happened when I tested them locally.
Getting Started with CosVid
CosVid is the speed demon of the two. It uses an autoregressive approach that generates frames sequentially, which cuts down render times significantly compared to standard diffusion models. The first thing I noticed was how fast it churned out clips—my initial 5-second test at 8 steps finished in just over a minute on a 4090.
The model files come in two sizes: a 28.6GB full version and a 14GB FP8 quantized version. I tried both and saw minimal quality difference, so I stuck with the smaller file to save space. You’ll need the standard WAN 2.1 VAE and text encoder files too—same ones used in other WAN workflows.
One quirk: CosVid really wants to make moving videos. Even with bland prompts, it insists on adding camera pans or subtle motions. For quick social media clips, this works great. But when I tried extending sequences beyond 24 frames, the movement became repetitive—like a GIF on loop. Stick to shorter clips unless you’re okay with that effect.
MovieGen’s Cinematic Approach
MovieGen flips the script entirely. Where CosVid prioritizes speed, this model goes all-in on film-quality output. The first test render stopped me cold—the lighting, depth, and consistency looked like something from a professional animatic. No more regenerating prompts hoping for usable frames.
There’s a tradeoff though. Those polished results take longer to generate and demand more VRAM. My 4090 handled the FP16 version, but I wouldn’t attempt it on anything weaker. The model also seems optimized for specific shot types—close-ups and medium shots work better than wide establishing shots in my tests.
I ran into the same motion limitation as CosVid when pushing past 97 frames. The model would either freeze the subject or introduce minor loops. For now, keeping clips under 5 seconds avoids this. Alibaba’s documentation hints at longer sequence support coming soon though.
Side-by-Side Workflow Tests
Running both models in ComfyUI revealed some interesting differences:
- Prompt adherence: MovieGen nails specific details better—if you request “a woman in a red dress dancing,” you get exactly that. CosVid sometimes takes creative liberties.
- Render times: CosVid finished an 8-step render in 1:11, while MovieGen took nearly 4 minutes for comparable quality.
- VRAM use: MovieGen’s FP16 version needed about 18GB, while CosVid’s FP8 ran comfortably in 12GB.
For quick content, CosVid is my new go-to. When quality matters—client work or portfolio pieces—I’ll eat the render time and use MovieGen. Both models share the same basic WAN 2.1 workflow structure, so switching between them only requires swapping the model file.
One pro tip: skip the LoRA files unless you’re fine-tuning. I saw no noticeable improvement with them, and they just add another layer of complexity to the workflow. The base models work great on their own.
Models Require:
LORA
- https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_14B_T2V_lora_rank32.safetensors
- https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan21_CausVid_bidirect2_T2V_1_3B_lora_rank32.safetensors
Models Cauvid
- https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Wan2_1-T2V-14B_CausVid_fp8_e4m3fn.safetensors
- https://huggingface.co/lightx2v/Wan2.1-T2V-14B-CausVid/blob/main/causal_model.pt
MoviiGen