PhysicsDrivenWorld (PDW)
Physics-Corrected Video Generation via Warp-Guided LoRA Fine-Tuning
CogVideoX-2b + LoRA (r=16) Β· NVIDIA Warp Physics Β· Single H100 NVL
Key Result
| Metric | Base CogVideoX-2b | PDW (Ours) | Improvement |
|---|---|---|---|
| Diffusion MSE β test_medium | 2.2676 | 0.3861 | +83.0% |
| Diffusion MSE β test_very_high | 2.2763 | 0.3790 | +83.4% |
| Average | 2.272 | 0.383 | +83.2% |
The fine-tuned model predicts noise on physics-correct reference frames 83.2% more accurately than the base model, confirming that the Warp physics prior was successfully injected into the denoising weights.
Model Description
PhysicsDrivenWorld (PDW) fine-tunes CogVideoX-2b using Low-Rank Adaptation (LoRA) supervised by an NVIDIA Warp rigid-body physics simulator.
Modern video diffusion models generate visually plausible but physically inconsistent results β objects float, bounce unrealistically, or violate Newton's laws. PDW injects a physics prior into the model's denoising weights by training on Warp-simulated ground-truth trajectories.
The training objective is standard diffusion denoising MSE, but applied exclusively to frames that are physically correct by construction from the Warp simulator β so the model learns to denoise physics-consistent content better than physics-inconsistent content.
Architecture
| Component | Details |
|---|---|
| Base Model | CogVideoX-2b (2B parameter text-to-video diffusion transformer) |
| Adapter | LoRA β rank r=16, alpha=32 |
| Target Modules | to_q, to_k, to_v, to_out.0 (attention projections) |
| Trainable Params | ~3.7M of 2B total (0.185%) |
| Physics Engine | NVIDIA Warp 1.11.1 β GPU-accelerated rigid body simulator |
| Simulation | Semi-implicit Euler, 60 Hz, ground collision with restitution |
| Training Loss | Diffusion MSE on Warp-generated physics-correct frames |
| LR Schedule | 10-step linear warmup (1e-6 β 1e-4) then cosine decay to 1e-6 |
| Hardware | Single NVIDIA H100 NVL (99.9 GB VRAM) β 13.9 GB peak usage |
Training
Hyperparameters
| Hyperparameter | Value |
|---|---|
| LoRA rank (r) | 16 |
| LoRA alpha | 32 |
| LoRA dropout | 0.05 |
| Peak learning rate | 1e-4 |
| Optimiser | AdamW (Ξ²=(0.9, 0.999), Ξ΅=1e-8, weight_decay=0.01) |
| Training steps | 200 (5 epochs Γ 40 steps) |
| Batch size | 1 |
| Diffusion timesteps | DDPMScheduler (1000 steps), random t β [50, 950] |
| Precision | bfloat16 |
| Gradient clipping | 1.0 |
Training Data β Warp Physics Scenarios
Training uses synthetic videos rendered from NVIDIA Warp rigid-body simulations, not real-world video. This eliminates dataset bias and provides ground-truth physically-correct trajectories as supervision.
| Scenario | Drop Height | Restitution | Physics Behaviour |
|---|---|---|---|
| ball_drop_low | 2m | 0.70 | Low-energy drop, high bounce |
| ball_drop_high | 5m | 0.60 | Standard gravity, moderate bounce |
| ball_elastic | 3m | 0.85 | Very elastic β multiple high bounces |
| ball_heavy | 4m | 0.30 | Inelastic β dead stop after first bounce |
Convergence
| Epoch | Avg Loss | Notes |
|---|---|---|
| 1 | 1.512 | Warmup spike β expected |
| 2 | ~0.45 | Fast learning |
| 5 | 0.341 | Converged β 77% drop from epoch 1 |
How to Use
Load the Model
- Downloads last month
- -
Model tree for athul020/PhysicsDrivenWorld
Base model
zai-org/CogVideoX-2b