Update README.md

f70e507 verified 2 months ago

3.78 kB

	---
	base_model:
	- Lightricks/LTX-2.3
	language:
	- en
	license: other
	license_name: ltx-2-community-license
	license_link: https://github.com/Lightricks/LTX-2/blob/main/LICENSE
	pipeline_tag: any-to-any
	tags:
	- ltx-video
	- image-to-video
	pinned: true
	---

	# LTX-2.3 22B IC-LoRA Motion Track Control

	This is a motion track control IC-LoRA trained on top of LTX-2.3-22b, enabling the user to guide the motion of objects or regions in a generated video using sparse point trajectories.
	The user provides a reference video with colored spline overlays indicating desired motion paths, and the model generates a video that follows these trajectories.
	Tracks can be extracted from existing videos using point tracking methods such as SpatialTrackerV2 or drawn manually.
	For drawing manually we provide the trajectory drawing node in ComfyUI. See [example workflow](https://github.com/Lightricks/ComfyUI-LTXVideo/blob/master/example_workflows/2.3/LTX-2.3_ICLoRA_Motion_Track_Distilled.json)

	It is based on the [LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) foundation model.
	- Paper: [AVControl: Efficient Framework for Training Audio-Visual Controls](https://huggingface.co/papers/2603.24793)
	- Code: [GitHub Repository](https://github.com/Lightricks/LTX-2)
	- Project Page: [AVControl project page](https://matanby.github.io/AVControl/)

	## What is In-Context LoRA (IC LoRA)?

	IC LoRA enables conditioning video generation on reference video frames at inference time, allowing fine-grained video-to-video control on top of a text-to-video, base model.
	It allows also the usage of an initial image for image-to-video, and generate audio-visual output.

	## What is Reference Downscale Factor?

	IC LoRA uses a reference control signal, i.e. a video that is positionally aligned to the generated video and contains the reference for context.
	To allow for added efficiency, the reference video can be smaller, so it consumes less tokens.
	The reference downscale factor determines the expected downscaling of the reference video compared to the generated resolution.
	To signify the expected reference size, the checkpoint name will have a 'ref' denominator followed by the scale relative to the output resolution.

	## Model Files

	`ltx-2.3-22b-ic-lora-motion-track-control-ref0.5.safetensors`

	## License

	See the LTX-2-community-license for full terms.

	## Model Details

	- Base Model: LTX-2.3-22b Video
	- Training Type: IC LoRA
	- Control Type: Video with splines in locations where motion tracking is given as a constraint
	- Reference Downscale Factor: 2 (reference resolution is 0.5x the output resolution)

	### 🔌 Using in ComfyUI
	1. Copy the LoRA weights into `models/loras`.
	2. Use the official IC-LoRA workflow from the [LTX-2 ComfyUI repository](https://github.com/Lightricks/ComfyUI-LTXVideo/).
	3. Make sure to use the nodes supporting Reference Downscale Factor: LTXICLoRALoaderModelOnly to load the lora and extract the downscale factor, and LTXAddVideoICLoRAGuide to add the small latent as a guide.


	## Dataset

	The model was trained using a proprietary dataset.

	## Citation

	```bibtex
	@article{hacohen2025ltx2,
	title={LTX-2: Efficient Joint Audio-Visual Foundation Model},
	author={HaCohen, Yoav and Brazowski, Benny and Chiprut, Nisan and Bitterman, Yaki and Kvochko, Andrew and Berkowitz, Avishai and Shalem, Daniel and Lifschitz, Daphna and Moshe, Dudu and Porat, Eitan and others},
	journal={arXiv preprint arXiv:2601.03233},
	year={2025}
	}
	@misc{LTXVideoTrainer2025,
	title={LTX-Video Community Trainer},
	author={Matan Ben Yosef and Naomi Ken Korem and Tavi Halperin},
	year={2025},
	publisher={GitHub},
	}
	```

	## Acknowledgments

	- Base model by Lightricks
	- Training infrastructure: LTX-2 Community Trainer