LLaVE - a zhibinlan Collection

zhibinlan 's Collections

LLaVE

updated Mar 10, 2025

LLaVE is a series of large language and vision embedding models trained on a variety of multimodal embedding datasets

zhibinlan/LLaVE-0.5B

Image-Text-to-Text • 0.9B • Updated Mar 14, 2025 • 19 • 7
zhibinlan/LLaVE-2B

Image-Text-to-Text • 2B • Updated Mar 14, 2025 • 52 • 45
zhibinlan/LLaVE-7B

Image-Text-to-Text • 8B • Updated Mar 14, 2025 • 9 • 5
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

Paper • 2503.04812 • Published Mar 4, 2025 • 17