Quentin Gallouédec's picture

Hiring 💼

Quentin Gallouédec PRO

qgallouedec

huggingface

·

AI & ML interests

None yet

Recent Activity

updated a dataset about 2 hours ago

hf-doc-build/doc-build-dev

reacted to sergiopaniego's post with 🚀 about 4 hours ago

What happens when you make an LLM drive a car where physics are real and actions can't be undone? I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces. The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal. In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations. The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios. This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end. Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/ CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

updated a Space 1 day ago

qgallouedec/trackio-dev

View all activity

Organizations

upvoted 2 articles 3 days ago

Article

Did GPT 5.2 make a breakthrough discovery in theoretical physics?

7 days ago

•

56

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

+4

7 days ago

•

465

upvoted a paper 6 days ago

GLM-5: from Vibe Coding to Agentic Engineering

Paper • 2602.15763 • Published 9 days ago • 98

upvoted a paper 11 days ago

Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL

Paper • 2602.03773 • Published 23 days ago • 10

upvoted an article 13 days ago

Article

Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments

Jan 20

•

11

upvoted 2 articles 17 days ago

Article

Transformers.js v4 Preview: Now Available on NPM!

18 days ago

•

72

Article

Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face

Feb 11, 2025

•

105

upvoted 3 papers 20 days ago

Rethinking the Trust Region in LLM Reinforcement Learning

Paper • 2602.04879 • Published 22 days ago • 35

Defeating the Training-Inference Mismatch via FP16

Paper • 2510.26788 • Published Oct 30, 2025 • 31

Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR

Paper • 2602.05261 • Published 22 days ago • 49

upvoted an article 27 days ago

Article

Preference Tuning LLMs with Direct Preference Optimization Methods

+3

Jan 18, 2024

•

79

upvoted a collection 29 days ago

AlphaGenome

Collection of AlphaGenome models. • 5 items • Updated 29 days ago • 35

upvoted an article about 1 month ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

+2

Dec 1, 2025

•

302

upvoted 4 papers about 1 month ago

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 155

Learn Your Reference Model for Real Good Alignment

Paper • 2404.09656 • Published Apr 15, 2024 • 90

Nash Learning from Human Feedback

Paper • 2312.00886 • Published Dec 1, 2023 • 18

Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 19

upvoted an article about 1 month ago

Article

Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models

Dec 15, 2025

•

109

upvoted a paper about 1 month ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 228

upvoted a paper about 2 months ago

Hermes 4 Technical Report

Paper • 2508.18255 • Published Aug 25, 2025 • 45