Trying Out PyramidFlow for Quick Video Generation
I’ve been testing PyramidFlow in ComfyUI for the past week, and it’s surprisingly straightforward for whipping up short videos. You can generate 10-second clips at 768p and 24 FPS using just text prompts or an image input. What stood out to me is how little setup it requires compared to other video models—no need to tweak a dozen parameters just to get something usable.
Here’s the thing: PyramidFlow doesn’t brute-force high resolution right away. Instead, it builds up detail layer by layer using flow matching. I didn’t expect this approach to work as well as it does, but the results are smooth, especially for how little VRAM it eats up. Even on my mid-range GPU, the CPU offloading keeps things from crashing when I push the resolution.
Dialing In the Settings
The guidance_scale
parameter is what really controls the visual sharpness. For the 768p checkpoint, I found values between 7 and 9 work best for text-to-video. Drop it down to 7 if you’re using the 384p model—anything higher starts introducing artifacts.
Motion is handled separately by video_guidance_scale
. Crank it up for more dynamic movement, but keep it around 5 for 10-second clips unless you want things getting chaotic. I tried pushing it to 8 on a test run, and let’s just say the results were… abstract.
Getting It Running
Installation is the usual ComfyUI routine—grab the nodes from PyramidFlow’s GitHub and drop them into your custom_nodes
folder. If you’re new to this, I wrote a quick walkthrough on the ComfyUI Blog that covers the basics.
One thing I noticed: The workflow doesn’t need fancy control nets or LoRAs to produce decent output. Just plug in a prompt, set your durations, and hit queue. For image-to-video, drag your reference into a Load Image node and connect it to the PyramidFlow input. No complicated preprocessing needed, which is refreshing after wrestling with some other video models.
The first time I ran it, the render took about 4 minutes for a 768p clip on an RTX 3080. Not instant, but way faster than waiting for a 14B parameter model to churn through frames. If you’re short on VRAM, the team provides a 384p checkpoint that runs comfortably on 8GB cards.
For anyone tweaking the workflow, I’ve been experimenting with a custom node setup that speeds up batch generations. You can grab it from comfyuiblog’s GitHub if you want to try the same configs I’m using.