Sajib Acharjee Dip

Sajib-006
·

AI & ML interests

Multimodal Learning, Generative AI, Bioinformatics

Recent Activity

posted an update 1 day ago
Excited to share our paper: Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning A common assumption in test-time reasoning is that giving a model more chances to think or verify should improve performance. Our results show that this is only partly true. We introduce SEVRA, a serving-layer controller that decides when a frozen reasoning model should keep its initial answer and when it should actively verify it. Instead of treating verification as always useful, SEVRA asks a more deployment-focused question: Is this specific attempt likely recoverable by verification? We evaluate this through helpful fixes, harmful flips, extra calls, and realized token cost. Some key takeaways: * Selective verification improves over always verifying on MATH500 while reducing harmful flips. * On GSM8K, the controller verifies only a small fraction of examples but still improves accuracy. * However, a longer initial solve can sometimes match selective verification with fewer realized tokens. * Cheap serving-visible features, such as completion status, token count, and finalizer use, nearly match larger learned gates. * On CommonsenseQA, always-on verification hurts, showing that the best test-time compute action is workload-dependent. The main deployment lesson is simple: Tune the initial reasoning budget first. Then use selective recovery when explicit checks, bounded retries, auditability, or regression-risk control matter. Paper: https://huggingface.co/papers/2606.19808 Code: https://github.com/Sajib-006/SEVRA Replay dashboard: https://huggingface.co/spaces/sevra-space/sevra-replay Would love feedback from the community, especially on broader test-time compute allocation, risk-aware verification, and practical serving policies for reasoning models.
View all activity

Organizations

Hugging Face Discord Community's profile picture