BAREC Shared Task 2026 Collection Sentence-level and Document-level readability datasets for the BAREC Shared Task 2026 • 2 items • Updated 14 days ago • 1
MOSS-Audio Collection An open-source audio understanding model supporting speech recognition, environmental sound analysis, music understanding, time-aware QA, and complex • 7 items • Updated 29 days ago • 61
CLEF 2025 JOKER Track: No Pun Left Behind Collection Models developed for CLEF 2025 JOKER Track: No Pun Left Behind • 10 items • Updated Mar 2 • 1
Arab-Culture-Aligned Multimodal Embedding Models & Datasets Collection Where Visual Document Retrieval Goes Arabic • 4 items • Updated 24 days ago • 2
Arabic Semantic Embeddings Collection Find Details for all models here: [https://www.omarai.me/embeddings] • 15 items • Updated Apr 30 • 2
Arabic Speech Datasets Collection Best Datasets for Arabic Speech Tasks • 21 items • Updated 5 days ago • 20
KITAB-Bench Collection A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding • 24 items • Updated Feb 24, 2025 • 19
SARD: Synthetic Arabic Recognition Dataset Collection A large-scale synthetic Arabic OCR dataset comprising 843,622 book-style document images across 10 fonts, designed to advance VLM for Arabic Texts • 2 items • Updated May 19, 2025 • 7
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus Paper • 2407.11144 • Published Jul 15, 2024 • 10
BAREC Shared Task 2025 Collection Sentence-level and Document-level readability datasets for the BAREC Shared Task 2025 • 4 items • Updated 14 days ago • 2
BiMediX2 Collection BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities • 7 items • Updated Oct 24, 2025 • 10
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Paper • 2412.07769 • Published Dec 10, 2024 • 30