LatentSync

LatentSync by ByteDance is a tool for creating lifelike lip-synced videos directly from audio input.

Overview

LatentSync is an innovative tool for creating lifelike lip-synced videos directly from audio input. Unlike old-school methods, it skips complex intermediate steps like 3D modeling or facial landmarks. Instead, it uses latent diffusion models, which focus on delivering high-quality, time-based consistency in every frame.

The system was developed by ByteDance, the company behind TikTok, in partnership with researchers at Beijing Jiaotong University. 

The tool is available for download at GitHUB, for testing at FAL.ai and within ComfyUI (see wrappers links below).

The first tests are looking quite promising.

LatentSync now also comes with Gradio UI.

You can try adjusting the following inference parameters to achieve better results:

  • inference_steps [20-50]: A higher value improves visual quality but slows down the generation speed.
  • guidance_scale [1.0-3.0]: A higher value improves lip-sync accuracy but may cause the video distortion or jitter.

Tags

Freeware Apache License 2.0 PC-based #Video & Animation

Educators and Trainers Creative Professionals Content Creators Media and Film Makers Marketing and Branding Specialists Voice and Audio Professionals Developers and Tech Creators Nonprofit and Advocacy Creators Small Business Owners Entertainment and Performance Artists Professional Content Creators

This tool is free to use and is offered under Apache License 2.0.

Prompt: none

Generated on April 22, 2025:

LatentSync Version 1.5

Prompt: Static, motionless video lip synced.

Generated on January 6, 2025:

You DO need animation in your source video prior to using this tool. This is what happens if you upload a motionless video.

Prompt: none

Generated on January 5, 2025:

My ususal test. See larger resolution at https://youtube.com/shorts/X-GAREfcdao

Prompt: Take a bite of these muffins... Compare Tools

Generated on January 5, 2025:

Impressive work on the smiling face input video. Still looks quite natural.

Prompt: Jazz blues song + digital illustration style video Compare Tools

Generated on January 5, 2025:

The model doesn't quite get this half-profile view of the face. See larger output here https://youtube.com/shorts/nnoxH3D4uyg

Prompt: Kling's video + Udio's vocals.

Generated on January 5, 2025:

Singing through the mist. See larger output here https://youtube.com/shorts/J6Qkhdhvfjw

Rating:
Useful Links
ComfyUI LatentSync Wrapper

Other

This node provides lip-sync capabilities in ComfyUI using ByteDance's LatentSync model. It allows you to synchronize video lips with audio input.

This page was last updated on April 19, 2025 at 8:01 PM