Multimodal
updated
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Paper
• 2403.09611
• Published
• 129
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
Documents
Paper
• 2306.16527
• Published
• 47
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
• 2404.12387
• Published
• 39
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with
Text-Rich Visual Comprehension
Paper
• 2404.16790
• Published
• 10
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
• 2407.08083
• Published
• 32
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Paper
• 2209.06794
• Published
• 2
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding
and Generation
Paper
• 2410.13848
• Published
• 35
Aria: An Open Multimodal Native Mixture-of-Experts Model
Paper
• 2410.05993
• Published
• 111
Roadmap towards Superhuman Speech Understanding using Large Language
Models
Paper
• 2410.13268
• Published
• 33
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex
Capabilities
Paper
• 2410.11190
• Published
• 22
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Paper
• 2408.16725
• Published
• 53
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
Multimodal Understanding and Generation
Paper
• 2411.07975
• Published
• 31
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
Paper
• 2502.10391
• Published
• 34
Paper
• 2412.08905
• Published
• 122