Workflows

Nvidia Cosmos Model in ComfyUI Workflow

0
Please log in or register to do it.

Welcome to 2025! NVIDIA kicked off the year with a big announcement: the Cosmo series of diffusion models. If you’re into AI, this is an exciting moment! Today, I’ll guide you through testing NVIDIA Cosmo models on ComfyUI in the simplest way possible. Let’s jump in.


What You’ll Need

1. Text Encoder and VAE

Download the files here:

Files and where they go:

  • oldt5_xxl_fp8_e4m3fn_scaled.safetensors -> ComfyUI/models/text_encoders
  • cosmos_cv8x8x8_1.0.safetensors -> ComfyUI/models/vae

Note: The oldt5_xxl encoder is version 1.0, different from the version 1.1 used in other models like Flux.

2. Diffusion Models

Get them here:

Place them in: ComfyUI/models/diffusion_models

Want the original .pt files? Official links:

Key Terms:

  • Text to World = Text to Video
  • Video to World = Image/Video to Video

How to Set It Up

1. Download Files

Make sure you download all the required files:

  • Text encoder and VAE files for Cosmo.
  • Diffusion model safetensors (7B and/or 14B versions).

2. Save Files in the Right Folders

Put each file in its specific folder as outlined above. This step is crucial for ComfyUI to recognize the models.

3. Update ComfyUI

Before running the workflows, update ComfyUI to the latest version. Many issues happen because people forget this step!

4. Load the Workflows

You’ll need two workflows:

  • Text-to-Video
  • Image-to-Video

Download the JSON workflow files and save them locally. Open ComfyUI and load the workflows by dragging the JSON files into the interface.


Running Text-to-Video

  1. Load the Model
    Start by loading the Cosmo 7B or 14B Text-to-World safetensor model in the diffusion model loader node.
  2. Input Text
    Enter your prompt in the text encoder. Cosmo supports both positive and negative prompts for more precise control.
  3. Sampler Settings
    Use the new “ResMultiStep” sampler for the best results. The default scheduler is set to Karras but feel free to experiment with others.
  4. Output Settings
    Set the output to “Video Combine” and save as an MP4. You can also adjust the frame rate and resolution as needed.

Running Image-to-Video

  1. Load Your Image
    Use the “Load Image” node to input an image. Resize or crop it using the “Resize Image” node to match Cosmo’s default dimensions.
  2. Configure Nodes
    Pass the image through the VAE encoder and connect it to the “Image to Video Latent” node. Adjust parameters like frame count and batch size as needed.
  3. Generate Video
    Run the workflow and save the output as an MP4. You’ll get a short video (default: 5 seconds) based on your input image.

Quick Tips

  • Hardware Requirements: The 7B model is easier on your PC’s VRAM. The 14B model requires more resources.
  • Negative Prompts: Unlike some other models, Cosmo supports negative prompts. Use them for finer control over outputs.
  • Tiling: Cosmo uses a default tile size of 240 for VAE decoding. No need to configure this manually.
  • Torch Compatibility: Install Torch 2.5.1 for better performance when using “Torch Compile” settings.

Image to Video
🤖 Hey, welcome! Thanks for visiting comfyuiblog.com
Text to Video
🤖 Hey, welcome! Thanks for visiting comfyuiblog.com
Double AI Workflow Speed with Tea Cache Hunyuan, LTX, and FLUX Workflows

Your email address will not be published. Required fields are marked *