PhysicsDrivenWorld (PDW)

Physics-Corrected Video Generation via Warp-Guided LoRA Fine-Tuning

CogVideoX-2b + LoRA (r=16) Β· NVIDIA Warp Physics Β· Single H100 NVL


Key Result

Metric Base CogVideoX-2b PDW (Ours) Improvement
Diffusion MSE β€” test_medium 2.2676 0.3861 +83.0%
Diffusion MSE β€” test_very_high 2.2763 0.3790 +83.4%
Average 2.272 0.383 +83.2%

The fine-tuned model predicts noise on physics-correct reference frames 83.2% more accurately than the base model, confirming that the Warp physics prior was successfully injected into the denoising weights.


Model Description

PhysicsDrivenWorld (PDW) fine-tunes CogVideoX-2b using Low-Rank Adaptation (LoRA) supervised by an NVIDIA Warp rigid-body physics simulator.

Modern video diffusion models generate visually plausible but physically inconsistent results β€” objects float, bounce unrealistically, or violate Newton's laws. PDW injects a physics prior into the model's denoising weights by training on Warp-simulated ground-truth trajectories.

The training objective is standard diffusion denoising MSE, but applied exclusively to frames that are physically correct by construction from the Warp simulator β€” so the model learns to denoise physics-consistent content better than physics-inconsistent content.


Architecture

Component Details
Base Model CogVideoX-2b (2B parameter text-to-video diffusion transformer)
Adapter LoRA β€” rank r=16, alpha=32
Target Modules to_q, to_k, to_v, to_out.0 (attention projections)
Trainable Params ~3.7M of 2B total (0.185%)
Physics Engine NVIDIA Warp 1.11.1 β€” GPU-accelerated rigid body simulator
Simulation Semi-implicit Euler, 60 Hz, ground collision with restitution
Training Loss Diffusion MSE on Warp-generated physics-correct frames
LR Schedule 10-step linear warmup (1e-6 β†’ 1e-4) then cosine decay to 1e-6
Hardware Single NVIDIA H100 NVL (99.9 GB VRAM) β€” 13.9 GB peak usage

Training

Hyperparameters

Hyperparameter Value
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05
Peak learning rate 1e-4
Optimiser AdamW (Ξ²=(0.9, 0.999), Ξ΅=1e-8, weight_decay=0.01)
Training steps 200 (5 epochs Γ— 40 steps)
Batch size 1
Diffusion timesteps DDPMScheduler (1000 steps), random t ∈ [50, 950]
Precision bfloat16
Gradient clipping 1.0

Training Data β€” Warp Physics Scenarios

Training uses synthetic videos rendered from NVIDIA Warp rigid-body simulations, not real-world video. This eliminates dataset bias and provides ground-truth physically-correct trajectories as supervision.

Scenario Drop Height Restitution Physics Behaviour
ball_drop_low 2m 0.70 Low-energy drop, high bounce
ball_drop_high 5m 0.60 Standard gravity, moderate bounce
ball_elastic 3m 0.85 Very elastic β€” multiple high bounces
ball_heavy 4m 0.30 Inelastic β€” dead stop after first bounce

Convergence

Epoch Avg Loss Notes
1 1.512 Warmup spike β€” expected
2 ~0.45 Fast learning
5 0.341 Converged β€” 77% drop from epoch 1

How to Use

Load the Model


Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for athul020/PhysicsDrivenWorld

Adapter
(4)
this model