I’ve been playing with Skyreels V2 AI for the past week, and let me just say—this update is wild. The big headline? Infinite-length video generation. Yeah, you read that right. Whether you’re starting from text, an image, or even an existing clip, Skyreels V2 can stretch it into something way longer without turning into a glitchy mess.
But here’s the thing—I didn’t just take their word for it. I tested it myself in ComfyUI, comparing the 13B and 14B models side by side. And spoiler: the differences are bigger than you might think.
How Skyreels V2’s Diffusion Forcing Works
Skyreels calls their magic trick “diffusion forcing”—a fancy way of saying the AI can now generate ultra-long videos without losing consistency. No frozen frames, no warped faces, just smooth motion from start to finish.
The key is how it handles frame overlap. Instead of generating a video in one go (which would melt your GPU), it works in chunks. Each new segment takes the last 17 frames of the previous one as a starting point, blending everything seamlessly.
I tried this with a 97-frame wolf animation. The 14B model nailed it—natural paw movements, no background flickering. The 13B version? Still smooth, but the fur details got fuzzy, and there were minor glitches.
Setting Up Skyreels V2 in ComfyUI: A Few Hiccups
First, the install isn’t plug-and-play. You’ll need to:
- Manually install the One Video Wrapper via command prompt (it’s not in the default ComfyUI manager).
git clone https://github.com/kijai/ComfyUI-WanVideoWrapper.git
cd ComfyUI-WanVideoWrapper
pip install -r requirements.txt
- Grab the model files from the When Videos repo—look for “Juan 2.1 Skyreels V2 DF.” I used the 1.3B FP32 version because it runs on 5-6GB VRAM (hello, mid-range GPUs).
https://huggingface.co/Kijai/WanVideo_comfy/tree/main/Skyreels
- Load the workflow. The example files include a text-to-video template, but I tweaked mine to focus on video extensions.
Pro tip: The Diffusion Forcing Sampler is your best friend here. It’s optimized for long videos and supports prefix sampling—that’s what keeps motions fluid between segments
The Verdict (So Far)
If you’ve got the VRAM, the 14B model is worth the extra horsepower. Sharper details, fewer artifacts. But the 13B version holds up surprisingly well for simpler projects—just avoid close-ups of faces or intricate textures.