YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
PIPer Stage 1 SFT - ShareGPT Checkpoint
Intermediate checkpoint from Stage 1 of the PIPer training pipeline.
Model Description
- Base Model: Qwen3-8B-am
- Training: Supervised Fine-Tuning on ShareGPT conversations
- Purpose: Instruction-following baseline for Stage 2 RL training
Training Data
- Dataset: PIPer-SFT-ShareGPT-Data
- Training Samples: 2,250 conversations
- Validation Samples: 250 conversations
Training Configuration
- Batch Size: 256 (8 GPUs × 32 samples)
- Learning Rate: 2e-5 with cosine schedule
- Warmup: 10% of steps
- Sequence Length: 4096 tokens
- Epochs: 3
- Training Steps: 24
- Training Time: ~15 minutes
- Hardware: 8x NVIDIA H200 GPUs
Performance
This checkpoint serves as the initialization for Stage 2 RL training, which achieves:
- 100% pass@5 on EnvBench (20-problem evaluation)
See PIPer-Stage2-RL-Final for the final trained model.
Usage
Use this checkpoint as a starting point for RL fine-tuning on environment setup tasks:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"PIPer-Stage1-SFT-ShareGPT",
trust_remote_code=True,
torch_dtype="bfloat16",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("PIPer-Stage1-SFT-ShareGPT")
Related Models
- Next Stage: PIPer-Stage2-RL-Final (RL-trained from this checkpoint)
License
Same as base model (Qwen3-8B-am)
- Downloads last month
- 2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support