YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PIPer Stage 1 SFT - ShareGPT Checkpoint

Intermediate checkpoint from Stage 1 of the PIPer training pipeline.

Model Description

  • Base Model: Qwen3-8B-am
  • Training: Supervised Fine-Tuning on ShareGPT conversations
  • Purpose: Instruction-following baseline for Stage 2 RL training

Training Data

Training Configuration

  • Batch Size: 256 (8 GPUs × 32 samples)
  • Learning Rate: 2e-5 with cosine schedule
  • Warmup: 10% of steps
  • Sequence Length: 4096 tokens
  • Epochs: 3
  • Training Steps: 24
  • Training Time: ~15 minutes
  • Hardware: 8x NVIDIA H200 GPUs

Performance

This checkpoint serves as the initialization for Stage 2 RL training, which achieves:

  • 100% pass@5 on EnvBench (20-problem evaluation)

See PIPer-Stage2-RL-Final for the final trained model.

Usage

Use this checkpoint as a starting point for RL fine-tuning on environment setup tasks:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "PIPer-Stage1-SFT-ShareGPT",
    trust_remote_code=True,
    torch_dtype="bfloat16",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("PIPer-Stage1-SFT-ShareGPT")

Related Models

License

Same as base model (Qwen3-8B-am)

Downloads last month
2
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support