OmniHuman-1 came out early 2025 as a research preview and later got added to ByteDance’s Dreamina platform. It uses a single image—whether it’s a portrait, half-body or full-body—and combines it with a motion input like audio or video to make a realistic animated video.
It can handle talking or singing from audio, motion copying from video, or both together for more control. It moves lips, faces, hands, bodies and even backgrounds. Works with both real photos and cartoon-like characters or animals.
The tool works with any size or shape image and any kind of body shape. It keeps the look steady across frames, makes smooth motion and the lip-sync looks spot on. Tests and lab demos say it beats older models.
Training used a new setup that mixed audio, video and body pose inputs to boost movement quality using classifier-free tricks and diffusion transformers.
If you'd like to access this model, you can explore the following possibilities:
Use our video cost calculator to compare prices between platforms offering OmniHuman 1 model.
For locally hosted models, see description and additional links at the bottom for versions, repos and tutorials.