RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics
Paper
•
2506.04308
•
Published
•
43
None defined yet.
Latent Chain-of-Thought as Planning: Decoupling Reasoning from Verbalization
The Sonar Moment: Benchmarking Audio-Language Models in Audio Geo-Localization