PhysicsDrivenWorld (PDW)

Physics-Corrected Video Generation via Warp-Guided LoRA Fine-Tuning

CogVideoX-2b + LoRA (r=16) · NVIDIA Warp Physics · Single H100 NVL

Key Result

Metric	Base CogVideoX-2b	PDW (Ours)	Improvement
Diffusion MSE — test_medium	2.2676	0.3861	+83.0%
Diffusion MSE — test_very_high	2.2763	0.3790	+83.4%
Average	2.272	0.383	+83.2%

The fine-tuned model predicts noise on physics-correct reference frames 83.2% more accurately than the base model, confirming that the Warp physics prior was successfully injected into the denoising weights.

Model Description

PhysicsDrivenWorld (PDW) fine-tunes CogVideoX-2b using Low-Rank Adaptation (LoRA) supervised by an NVIDIA Warp rigid-body physics simulator.

Modern video diffusion models generate visually plausible but physically inconsistent results — objects float, bounce unrealistically, or violate Newton's laws. PDW injects a physics prior into the model's denoising weights by training on Warp-simulated ground-truth trajectories.

The training objective is standard diffusion denoising MSE, but applied exclusively to frames that are physically correct by construction from the Warp simulator — so the model learns to denoise physics-consistent content better than physics-inconsistent content.

Architecture

Component	Details
Base Model	CogVideoX-2b (2B parameter text-to-video diffusion transformer)
Adapter	LoRA — rank r=16, alpha=32
Target Modules	`to_q`, `to_k`, `to_v`, `to_out.0` (attention projections)
Trainable Params	~3.7M of 2B total (0.185%)
Physics Engine	NVIDIA Warp 1.11.1 — GPU-accelerated rigid body simulator
Simulation	Semi-implicit Euler, 60 Hz, ground collision with restitution
Training Loss	Diffusion MSE on Warp-generated physics-correct frames
LR Schedule	10-step linear warmup (1e-6 → 1e-4) then cosine decay to 1e-6
Hardware	Single NVIDIA H100 NVL (99.9 GB VRAM) — 13.9 GB peak usage

Training

Hyperparameters

Hyperparameter	Value
LoRA rank (r)	16
LoRA alpha	32
LoRA dropout	0.05
Peak learning rate	1e-4
Optimiser	AdamW (β=(0.9, 0.999), ε=1e-8, weight_decay=0.01)
Training steps	200 (5 epochs × 40 steps)
Batch size	1
Diffusion timesteps	DDPMScheduler (1000 steps), random t ∈ [50, 950]
Precision	bfloat16
Gradient clipping	1.0

Training Data — Warp Physics Scenarios

Training uses synthetic videos rendered from NVIDIA Warp rigid-body simulations, not real-world video. This eliminates dataset bias and provides ground-truth physically-correct trajectories as supervision.

Scenario	Drop Height	Restitution	Physics Behaviour
ball_drop_low	2m	0.70	Low-energy drop, high bounce
ball_drop_high	5m	0.60	Standard gravity, moderate bounce
ball_elastic	3m	0.85	Very elastic — multiple high bounces
ball_heavy	4m	0.30	Inelastic — dead stop after first bounce

Convergence

Epoch	Avg Loss	Notes
1	1.512	Warmup spike — expected
2	~0.45	Fast learning
5	0.341	Converged — 77% drop from epoch 1

How to Use

Load the Model

Downloads last month: -

Model tree for athul020/PhysicsDrivenWorld

Base model

zai-org/CogVideoX-2b

Adapter

(4)

this model