Learnings - Gotchas
- For tool-calling SFT with assistant-only loss, trajectories that end on a
toolturn do not teach stop behavior, so append a final content-only assistant turn after terminal tool confirmations to reduce post-answer extra tool calls. (F014) - Repeat penalties should key on
(method, argument)over a short recent-call window (dequemaxlen=3) so alternating reuse patterns likeA→B→Aare penalized while cross-method same-argument calls are not. (F015)