Instructions to use xmuhtt/LiveAct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use xmuhtt/LiveAct with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("xmuhtt/LiveAct", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory
SoulX-LiveAct presents a novel framework that enables lifelike, multimodal-controlled, high-fidelity human animation video generation for real-time streaming interactions.
(I) We identify diffusion-step-aligned neighbor latents as a key inductive bias for AR diffusion, providing a principled and theoretically grounded Neighbor Forcing for step-consistent AR video generation.
(II) We introduce ConvKV Memory, a lightweight plug-in compression mechanism that enables constant-memory hour-scale video generation with negligible overhead.
(III) We develop an optimized real-time system that achieves 20 FPS using only two H100/H200 GPUs with end-end adaptive FP8 precision, sequence parallelism, and operator fusion at 720×416 or 512×512 resolution.
🔥🔥🔥 News
- 📢 Mar 18, 2026: We now support consumer GPUs (e.g., RTX 4090, RTX 5090) with FP8 KV cache and CPU model offloading. In our tests, the 18B model (14B Wan2.1 + 4B audio module) achieves a throughput of 6 FPS on a single RTX 5090.
- 👋 Mar 16, 2026: We release the inference code and model weights of SoulX-LiveAct.
🎥 Demo
👫 Podcast
🎤 Music & Talk Show
📱 FaceTime
📑 Open-source Plan
- Release inference code and checkpoints
- GUI demo Support
- End-end adaptive FP8 precision
- Support model offloading for consumer GPUs (e.g., RTX 4090, RTX 5090) to reduce memory usage
- Support FP4 precision for B-series GPUs (e.g., RTX 5090, B100, B200)
- Release training code
▶️ Quick Start
🛠️ Dependencies and Installation
Step 1: Install Basic Dependencies
conda create -n liveact python=3.10
conda activate liveact
pip install -r requirements.txt
conda install conda-forge::sox -y
Step 2: Install SageAttention
To enable fp8 attention kernel, you need to install SageAttention:
Install SageAttention:
git clone https://github.com/thu-ml/SageAttention.git cd SageAttention git checkout v2.2.0 python setup.py install(Optional) Install the modified version of SageAttention: To enable SageAttention for QKV's operator fusion, you need to install it by the following command:
git clone https://github.com/ZhiqiJiang/SageAttentionFusion.git cd SageAttentionFusion python setup.py install
Step 3: Install vllm:
To enable fp8 gemm kernel, you need to install vllm:
pip install vllm==0.11.0
Step 4 Install LightVAE::
git clone https://github.com/ModelTC/LightX2V
cd LightX2V
python setup_vae.py install
🤗 Download Checkpoints
Model Cards
| ModelName | Download |
|---|---|
| SoulX-LiveAct | 🤗 Huggingface |
| chinese-wav2vec2-base | 🤗 Huggingface |
🔑 Inference
Usage of LiveAct
1. Run real-time streaming inference on two H100/H200 GPUs
USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
generate.py \
--size 416*720 \
--ckpt_dir MODEL_PATH \
--wav2vec_dir chinese-wav2vec2-base \
--fps 20 \
--dura_print \
--input_json examples/example.json \
--steam_audio
2. Run with the best performance settings
USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
generate.py \
--size 480*832 \
--ckpt_dir MODEL_PATH \
--wav2vec_dir chinese-wav2vec2-base \
--fps 24 \
--input_json examples/example.json
3. Run with action or emotion editing
USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
generate.py \
--size 512*512 \
--ckpt_dir MODEL_PATH \
--wav2vec_dir chinese-wav2vec2-base \
--fps 24 \
--input_json examples/example_edit.json
4. Run on RTX 4090/RTX 5090 GPUs
Note: FP8 KV cache may slightly affect generation quality.
USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
--size 416*720 \
--ckpt_dir MODEL_PATH \
--wav2vec_dir chinese-wav2vec2-base \
--fps 24 \
--input_json examples/example.json \
--fp8_kv_cache \
--block_offload \
--t5_cpu
5. Run with single GPU for Eval
USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
python generate.py \
--size 480*832 \
--ckpt_dir MODEL_PATH \
--wav2vec_dir chinese-wav2vec2-base \
--fps 24 \
--input_json examples/example.json \
--audio_cfg 1.7 \
--t5_cpu
Command Line Arguments
| Argument | Type | Required | Default | Description |
|---|---|---|---|---|
--size |
str | Yes | - | The width and height of the generated video. |
--t5_cpu |
bool | No | false | Whether to place T5 model on CPU. |
--offload_cache |
bool | No | - | Whether to place kv cache on CPU. |
--fps |
int | Yes | - | The target fps of the generated video. |
--audio_cfg |
float | No | 1.0 | Classifier free guidance scale for audio control. |
--dura_print |
bool | No | no | Whether print duration for every block. |
--input_json |
str | Yes | _ | The condition json file path to generate the video. |
--seed |
int | No | 42 | The seed to use for generating the image or video. |
--steam_audio |
bool | No | false | Whether inference with steaming audio. |
--mean_memory |
bool | No | false | Whether to use the mean memory strategy during inference for further performance improvement. |
--fp8_kv_cache |
bool | No | false | Whether to store kv cache in FP8 and dequantize to BF16 on use. FP8 KV cache may slightly affect generation quality. |
--block_offload |
bool | No | false | Whether to offload WanModel blocks to CPU between block forwards. |
💻 GUI demo
Run SoulX-LiveAct inference on the GUI demo and evaluate real-time performance.
Note: The first few blocks during the initial run require warm-up. Normal performance will be observed from the second run onward.
1. Run real-time streaming inference on two H100/H200 GPUs
USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0,1 \
torchrun --nproc_per_node=2 --master_port=$(shuf -n 1 -i 10000-65535) \
demo.py \
--ckpt_dir MODEL_PATH \
--wav2vec_dir chinese-wav2vec2-base \
--size 416*720 \
--video_save_path ./generated_videos
2. Run on RTX 4090/RTX 5090 GPUs
USE_CHANNELS_LAST_3D=1 CUDA_VISIBLE_DEVICES=0 \
torchrun --nproc_per_node=1 --master_port=$(shuf -n 1 -i 10000-65535) \
demo.py \
--ckpt_dir MODEL_PATH \
--wav2vec_dir chinese-wav2vec2-base \
--size 416*720 \
--fp8_kv_cache \
--block_offload \
--t5_cpu \
--video_save_path ./generated_videos
📚 Citation
@misc{zhen2026soulxliveacthourscalerealtimehuman,
title={SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory},
author={Dingcheng Zhen and Xu Zheng and Ruixin Zhang and Zhiqi Jiang and Yichao Yan and Ming Tao and Shunshun Yin},
year={2026},
eprint={2603.11746},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.11746},
}
📮 Contact Us
If you are interested in leaving a message to our work, feel free to email dingchengzhen@soulapp.cn.
You’re welcome to join our WeChat group or Soul group for technical discussions.
- Downloads last month
- 24
Model tree for xmuhtt/LiveAct
Base model
Wan-AI/Wan2.1-I2V-14B-480P