kotol

company
Activity Feed

AI & ML interests

None defined yet.

pcuenqย 
posted an update 4 days ago
view post
Post
2435
๐Ÿ‘‰ What happened in AI in 2025? ๐Ÿ‘ˆ

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1๏ธโƒฃ Q1 โ€” Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2๏ธโƒฃ Q2 โ€” Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3๏ธโƒฃ Q3 โ€” "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4๏ธโƒฃ Q4 โ€” Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 ๐Ÿคฏ

Credits
๐Ÿ™ NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

๐Ÿซก @reach-vb for the original idea, design and recipe

๐Ÿ™Œ @ariG23498 and yours truly for compiling and verifying the 2025 edition

๐Ÿฅณ Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! ๐Ÿฅ‚
  • 1 reply
ยท
merveย 
posted an update 3 months ago
view post
Post
7663
deepseek-ai/DeepSeek-OCR is out! ๐Ÿ”ฅ my take โคต๏ธ
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
ยท
Molbapย 
posted an update 3 months ago
view post
Post
3320
๐Ÿš€ New blog: Maintain the unmaintainable โ€“ 1M+ Python LOC, 400+ models

How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At ๐Ÿค— Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.

๐Ÿ” Inside the post:
โ€“ One Model, One File: readability first โ€” you can still open a modeling file and see the full logic, top to bottom.
โ€“ Modular Transformers: visible inheritance that cuts maintenance cost by ~15ร— while keeping models readable.
โ€“ Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.

Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.

Read it here โ†’ transformers-community/Transformers-tenets
merveย 
posted an update 4 months ago
view post
Post
6826
large AI labs open-sourced a ton of models last week ๐Ÿ”ฅ
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 ๐Ÿค
> IBM released a new Docling model with 258M params based on Granite (A2.0) ๐Ÿ“ ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana ๐ŸŒ (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset ๐Ÿ’ป OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash ๐Ÿ’ญ meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
ยท
merveย 
posted an update 4 months ago
view post
Post
3405
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face ๐Ÿ”ฅ

> not only a document converter but also can do document question answering, understand multiple languages ๐Ÿคฏ
> best part: released with Apache 2.0 license ๐Ÿ‘ use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! ๐Ÿค—
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo ๐Ÿ’—
merveย 
posted an update 4 months ago
view post
Post
1204
a ton of image/video generation models and LLMs from big labs ๐Ÿ”ฅ

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use ๐Ÿ’ฌ
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR ๐Ÿ“
> ByteDance released bytedance-research/HuMo, video generation from any input โฏ๏ธ

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac
merveย 
posted an update 4 months ago
view post
Post
1011
fan-favorite vision LM Florence-2 is now officially supported in transformers ๐Ÿค—

find all the models in
florence-community
org ๐Ÿซก
ariG23498ย 
posted an update 4 months ago
view post
Post
1422
New post is live!

This time we cover some major updates to transformers.

๐Ÿค—
  • 2 replies
ยท
merveย 
posted an update 4 months ago
merveย 
posted an update 4 months ago
merveย 
posted an update 4 months ago
view post
Post
6289
large AI labs have dropped so many open models last week ๐Ÿ”ฅ don't miss out on them

โ†’ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
โ†’ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
โ†’ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486
  • 1 reply
ยท
merveย 
posted an update 5 months ago
view post
Post
6074
first vision language model built off openai/gpt-oss-20b just dropped! ๐Ÿ”ฅ

InternVL3.5 comes with 32 models ๐Ÿคฏ pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part โคต๏ธ
  • 1 reply
ยท
Xenovaย 
posted an update 5 months ago
view post
Post
14414
Okay this is insane... WebGPU-accelerated semantic video tracking, powered by DINOv3 and Transformers.js! ๐Ÿคฏ
Demo (+ source code): webml-community/DINOv3-video-tracking

This will revolutionize AI-powered video editors... which can now run 100% locally in your browser, no server inference required (costs $0)! ๐Ÿ˜

How does it work? ๐Ÿค”
1๏ธโƒฃ Generate and cache image features for each frame
2๏ธโƒฃ Create a list of embeddings for selected patch(es)
3๏ธโƒฃ Compute cosine similarity between each patch and the selected patch(es)
4๏ธโƒฃ Highlight those whose score is above some threshold

... et voilร ! ๐Ÿฅณ

You can also make selections across frames to improve temporal consistency! This is super useful if the object changes its appearance slightly throughout the video.

Excited to see what the community builds with it!
ยท