view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 17 days ago • 91
ARC-Encoders Collection Pretrained ARC-Encoders and a fine-tuning dataset: context compression for unmodified LLMs. • 7 items • Updated 10 days ago • 4
view article Article Luth: Efficient French Specialization for Small Language Models Aug 11, 2025 • 18
view article Article Should We Still Pretrain Encoders with Masked Language Modeling? Jul 2, 2025 • 21
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling Paper • 2409.14683 • Published Sep 23, 2024 • 11