High-fidelity 3D Generation from images
Demo for multimodal understanding and generation
VLMEvalKit Evaluation Results Collection