Workflows

Nvidia Cosmos Model in ComfyUI Workflow

0
Please log in or register to do it.

NVIDIA just dropped their Cosmos series, and if you’re into AI video generation, this one’s worth checking out. I spent the last few days testing it in ComfyUI, and here’s how it went.

First things first—you’ll need a few files. The text encoder and VAE are over on Hugging Face (grab them here). Save oldt5_xxl_fp8_e4m3fn_scaled.safetensors in ComfyUI/models/text_encoders and cosmos_cv8x8x8_1.0.safetensors in ComfyUI/models/vae. Fair warning: the text encoder is v1.0, not the newer 1.1 version you might’ve seen in models like Flux.

For the diffusion models, you’ve got options. The repackaged safetensors are easier to work with—just drop them into ComfyUI/models/diffusion_models. If you want the original .pt files, NVIDIA’s got the 7B and 14B versions for both text-to-video and image-to-video (Model Link).

Want the original .pt files? Official links:

Setting It Up

I just updated ComfyUI to the latest version (always do this first—skipping it causes half the issues people complain about). Then I dragged in the workflow JSON files. No fancy steps, just load and go.

For text-to-video, the key node is the diffusion model loader. Pick either the 7B or 14B safetensor, depending on your VRAM. The 7B runs fine on my 24GB GPU, but the 14B needs a bit more breathing room.

Here’s the thing: Cosmo works best with the new res_multistep sampler. It’s the one NVIDIA used in their paper, and yeah, it makes a difference. The default Karras scheduler works, but don’t be afraid to tweak it.

Image-to-video is where things get interesting. Load your image, resize it to match Cosmo’s defaults (no guesswork—just use the Resize Image node), and pipe it through the VAE encoder. The output’s a 5-second clip by default, but you can adjust the frame count if you’re feeling experimental.

Quick Notes

  • Negative prompts actually work here, unlike some other models. Use them.
  • The 7B model’s a safer bet if your GPU’s not top-tier.
  • Torch 2.5.1 helps if you’re using torch.compile().
Image to Video
🤖 Hey, welcome! Thanks for visiting comfyuiblog.com
Text to Video
🤖 Hey, welcome! Thanks for visiting comfyuiblog.com
Hunyuan3D ComfyUI Workflow
Double AI Workflow Speed with Tea Cache Hunyuan, LTX, and FLUX Workflows

Your email address will not be published. Required fields are marked *