HunyuanVideo is an advanced open-source text-to-video generator with 13 billion parameters released in 2025 by Tencent.
It converts your text inputs into smooth, high-quality videos (up to 720p, approximately 5 seconds) featuring strong motion, visuals, and alignment with prompts. It even outperforms some popular proprietary models in professional human evaluations.
You can access the code and model weights on Hugging Face and GitHub, run it locally (on Linux with a CUDA GPU that has around 60–80 GB of VRAM), or utilize demos through platforms like Gradio, Diffusers, ComfyUI, Replicate API, and others.
If you'd like to access this model, you can explore the following possibilities:
Use our video cost calculator to compare prices between platforms offering HunyuanVideo model.
For locally hosted models, see description and additional links at the bottom for versions, repos and tutorials.