Workflows

ComfyUI Perfect Lip Syncs Movements in Any Language Workflow

0
Please log in or register to do it.

Hi there! Imagine having a video in English and effortlessly syncing its lip movements to audio in another language. With Latent Sync,

In this blog post, we’ll explore how to use the Latent Sync workflow to achieve seamless lip-syncing for your videos.

What is Latent Sync?

Latent Sync is an advanced AI-based framework developed by researchers at ByteDance and Beijing Jiaotong University. It’s designed to map phonemes (the smallest units of sound in speech) to accurate lip movements, ensuring flawless synchronization.

Key Features:

  1. Unmatched Accuracy: Incorporates TREPA for superior temporal consistency.
  2. Virtual Avatars: Create human-like speech patterns for digital avatars.
  3. Flexibility: Works with various video lengths and audio files.

Setting Up the Workflow

Requirements:

  • Python versions 3.8 to 3.11 (avoid 3.12 as Mediapipe isn’t compatible yet). but here the Solution Download from here and add to system PATH

Location to Save that File

C:\ffmpeg

Install Dependencies

  • Save the necessary files in your custom node directory.
  • Clone the Latent Sync Wrapper repository via the command line:
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-LatentSyncWrapper.git
cd ComfyUI-LatentSyncWrapper
pip install -r requirements.txt

Add the Model

  • Download latentsync_unet.pt and place it in the Checkpoint folder of the Latent Sync Wrapper.
  • Create a whisper folder within Checkpoint and save the tiny.py file there.

Place them in the following structure:

ComfyUI/custom_nodes/ComfyUI-LatentSyncWrapper/checkpoints/
├── latentsync_unet.pt
└── whisper/
    └── tiny.pt

Run ComfyUI as Administrator
If you encounter PYTHONPATH errors, running ComfyUI with admin privileges should resolve the issue.

Known Limitations

  • Works best with clear, frontal face videos.
  • Doesn’t support anime/cartoon faces yet.
  • Input video must be 25 FPS (automatically converted if needed).
  • Ensure the face is visible throughout the video.

Results That Speak for Themselves

Once the process is complete, you’ll notice how naturally the video’s lip movements align with the new audio. The tool analyzes the audio, breaks it into phonemes, and ensures each phoneme matches the correct lip shape.

For example:

  • The “p” in “perfect” has a precise visual representation.
  • Complex sounds and subtle expressions are seamlessly matched.
Download Workflows
🤖 Hey, welcome! Thanks for visiting comfyuiblog.com
ComfyUI Advanced Inpainting Workflow with Flux Redus

Your email address will not be published. Required fields are marked *

  1. Thanks for sharing 🙏
    I followed every step but once running in ComfyUI it gets an error:

    D_LatentSyncNode
    Failed to execute module: No module named ‘decord’

    I guess installing requirement didn’t work as it should, I did it twice (just in case) but I get the same error.
    I’ve tried to run as ADMIN as well, it didn’t help.

    I’ve tried to install the missing module manually via CMD:
    “pip install decord”

    But there is another error:

    WARNING: Error parsing dependencies of torchsde: .* suffix can only be used with `==` or `!=` operators
    numpy (>=1.19.*) ; python_version >= “3.7”
    ~~~~~~~^

    I’m not a programmer, I just followed your instructions.
    Can you please explain how to fix it?

    Thanks ahead! 💙