-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
Collections
Discover the best community collections!
Collections including paper arxiv:2412.10704
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper • 2501.04652 • Published • 10 -
M-A-D/Mixed-Arabic-Datasets-Repo
Viewer • Updated • 209M • 750 • 38
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 71 -
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 51 -
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 55 -
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper • 2410.20088 • Published • 4
-
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 29 -
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Paper • 2410.10594 • Published • 29
-
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
Atla Selene Mini: A General Purpose Evaluation Model
Paper • 2501.17195 • Published • 35
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 50 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47
-
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Paper • 2409.12941 • Published • 23 -
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification
Paper • 2411.19638 • Published • 6 -
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Paper • 2412.02592 • Published • 24 -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16
-
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Paper • 2405.15223 • Published • 17 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 55 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 90 -
Matryoshka Multimodal Models
Paper • 2405.17430 • Published • 34
-
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding
Paper • 2411.04952 • Published • 29 -
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
Paper • 2410.10594 • Published • 29
-
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
How to Synthesize Text Data without Model Collapse?
Paper • 2412.14689 • Published • 53 -
Atla Selene Mini: A General Purpose Evaluation Model
Paper • 2501.17195 • Published • 35
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper • 2501.04652 • Published • 10 -
M-A-D/Mixed-Arabic-Datasets-Repo
Viewer • Updated • 209M • 750 • 38
-
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Paper • 2411.02337 • Published • 36 -
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models
Paper • 2411.04996 • Published • 50 -
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level
Paper • 2411.03562 • Published • 69 -
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization
Paper • 2410.08815 • Published • 47
-
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems
Paper • 2411.02959 • Published • 71 -
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization
Paper • 2411.02355 • Published • 51 -
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation
Paper • 2410.23090 • Published • 55 -
RARe: Retrieval Augmented Retrieval with In-Context Examples
Paper • 2410.20088 • Published • 4
-
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation
Paper • 2409.12941 • Published • 23 -
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification
Paper • 2411.19638 • Published • 6 -
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
Paper • 2412.02592 • Published • 24 -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16