8 6 1

Sajib Acharjee Dip

Sajib-006

https://www.linkedin.com/in/sajib006/

AI & ML interests

Multimodal Learning, Generative AI, Bioinformatics

Recent Activity

upvoted a paper about 8 hours ago

Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

posted an update 1 day ago

Excited to share our paper: Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning A common assumption in test-time reasoning is that giving a model more chances to think or verify should improve performance. Our results show that this is only partly true. We introduce SEVRA, a serving-layer controller that decides when a frozen reasoning model should keep its initial answer and when it should actively verify it. Instead of treating verification as always useful, SEVRA asks a more deployment-focused question: Is this specific attempt likely recoverable by verification? We evaluate this through helpful fixes, harmful flips, extra calls, and realized token cost. Some key takeaways: * Selective verification improves over always verifying on MATH500 while reducing harmful flips. * On GSM8K, the controller verifies only a small fraction of examples but still improves accuracy. * However, a longer initial solve can sometimes match selective verification with fewer realized tokens. * Cheap serving-visible features, such as completion status, token count, and finalizer use, nearly match larger learned gates. * On CommonsenseQA, always-on verification hurts, showing that the best test-time compute action is workload-dependent. The main deployment lesson is simple: Tune the initial reasoning budget first. Then use selective recovery when explicit checks, bounded retries, auditability, or regression-risk control matter. Paper: https://huggingface.co/papers/2606.19808 Code: https://github.com/Sajib-006/SEVRA Replay dashboard: https://huggingface.co/spaces/sevra-space/sevra-replay Would love feedback from the community, especially on broader test-time compute allocation, risk-aware verification, and practical serving policies for reasoning models.

submitted a paper 1 day ago

Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

View all activity

Organizations

Posts 2

Post

Excited to share our paper: Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning

A common assumption in test-time reasoning is that giving a model more chances to think or verify should improve performance. Our results show that this is only partly true.

We introduce SEVRA, a serving-layer controller that decides when a frozen reasoning model should keep its initial answer and when it should actively verify it. Instead of treating verification as always useful, SEVRA asks a more deployment-focused question:

Is this specific attempt likely recoverable by verification?

We evaluate this through helpful fixes, harmful flips, extra calls, and realized token cost.

Some key takeaways:

* Selective verification improves over always verifying on MATH500 while reducing harmful flips.
* On GSM8K, the controller verifies only a small fraction of examples but still improves accuracy.
* However, a longer initial solve can sometimes match selective verification with fewer realized tokens.
* Cheap serving-visible features, such as completion status, token count, and finalizer use, nearly match larger learned gates.
* On CommonsenseQA, always-on verification hurts, showing that the best test-time compute action is workload-dependent.

The main deployment lesson is simple:

Tune the initial reasoning budget first. Then use selective recovery when explicit checks, bounded retries, auditability, or regression-risk control matter.

Paper: Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning (2606.19808)
Code: https://github.com/Sajib-006/SEVRA
Replay dashboard: sevra-space/sevra-replay

Would love feedback from the community, especially on broader test-time compute allocation, risk-aware verification, and practical serving policies for reasoning models.

Post

136

Excited to share BenSyc v1.1 — a Bengali conversational sycophancy benchmark with privacy-redacted binary and five-class releases. Explore the full dataset and project Space here:

Dataset: Sajib-006/bensyc
Space: Sajib-006/bensyc-project
#BengaliNLP #DatasetRelease #AIAlignment