sql_env / docs /learnings /gotchas.md
hjerpe's picture
Upload folder using huggingface_hub
9e64e71 verified

Learnings - Gotchas

  • For tool-calling SFT with assistant-only loss, trajectories that end on a tool turn do not teach stop behavior, so append a final content-only assistant turn after terminal tool confirmations to reduce post-answer extra tool calls. (F014)
  • Repeat penalties should key on (method, argument) over a short recent-call window (deque maxlen=3) so alternating reuse patterns like A→B→A are penalized while cross-method same-argument calls are not. (F015)