| # Learnings - Architecture | |
| - Keep behavior-shaping reward logic inside `SQLEnvTRL` as additive trajectory-level state (`reward`, `_repeat_count`) so tool method signatures and TRL environment interfaces remain stable while internal semantics evolve. *(F015)* | |