sql_env / docs /learnings /architecture.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified
# Learnings - Architecture
- Keep behavior-shaping reward logic inside `SQLEnvTRL` as additive trajectory-level state (`reward`, `_repeat_count`) so tool method signatures and TRL environment interfaces remain stable while internal semantics evolve. *(F015)*