SAGE-OSS-40B
SAGE-OSS-40B is an open-source research release from SAGEA โ a 40B Mixture-of-Experts model fine-tuned and structurally extended from an open-source MoE base. It serves as an early testbed for two SAGEA research directions: the LoopCoder iterative inference mechanism and Inverse Reasoning (IR), the latter of which is formally described in our paper Thinking About Thinking: SAGE-nano's Inverse Reasoning for Self-Aware Language Models.
SAGE-OSS-40B also incorporates Multi-Context Heads (MCH), an attention-level modification developed alongside IR that allows the model to maintain independent context streams within a single forward pass.
This is not a production model. It predates the formalized IR training pipeline used in SAGE Actus 2.4 and Celer 2.6, and is released for transparency and community research.
Model Details
| Property | Value |
|---|---|
| Architecture | SAGELoopCoder (MoE, fine-tuned) |
| Parameters | ~40B |
| Tensor Type | BF16 |
| Context Length | 131,072 tokens |
| Vocab Size | 76,800 |
| Hidden Size | 5,120 |
| Layers | 80 |
| Attention Heads | 40 (GQA: 8 KV heads) |
| Loop Iterations | 2 |
| Loop Window Size | 64 |
| RoPE Theta | 500,000 |
| License | Apache 2.0 |
Architecture
LoopCoder
Rather than a single linear forward pass, LoopCoder performs loop_num iterative
passes over a sliding window of loop_window_size tokens, allowing the model to
refine intermediate representations before committing to output. This is
SAGEA's approach to building reasoning depth into the forward pass itself,
without relying on external scaffolding or prompting techniques.
- loop_num: 2 โ two iterative passes per generation step
- loop_window_size: 64 โ sliding token window for loop computation
Inverse Reasoning (IR)
IR is a metacognitive mechanism, first introduced in SAGE-nano (arXiv:2507.00092), that reflects back through the model's attention processes post-generation to identify key decision points in a reasoning chain and surface explanations for why specific paths were taken. In SAGE-OSS-40B, IR is implemented at the attention level โ the same formulation used in SAGE Actus 2.4 โ rather than the deeper pre-training integration it received in Celer 2.6.
Multi-Context Heads (MCH)
MCH is an attention-level modification developed alongside IR at SAGEA. A subset of attention heads are designated to maintain independent context representations in parallel during the forward pass, allowing the model to implicitly track divergent reasoning branches without requiring explicit beam search or sampling strategies. MCH was developed as a complement to IR: IR explains which path was taken; MCH helps the model hold multiple candidates before that decision is made.
Additional Properties
- GQA โ 40 attention heads, 8 KV heads for inference efficiency
- SiLU activations, RMS norm, no attention or MLP bias
- RoPE theta 500,000 for long-context stability
Usage
This model requires trust_remote_code=True due to the custom LoopCoder
architecture class.
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "sagea-ai/sage-oss-40b"
tokenizer = AutoTokenizer.from_pretrained(
model_id,
trust_remote_code=True
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = "Explain the concept of recursion in programming."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
eos_token_id=[2, 75864, 75869]
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Note: Standard pipeline inference will not work due to the custom
SAGELoopCoderForCausalLMclass. Use the snippet above directly.
Limitations
- Research release โ not instruction-tuned or RLHF aligned
- IR is implemented at the attention level; the deeper pre-training integration from Celer 2.6 is not present here
- No public benchmark results; evaluated informally during development
- LoopCoder adds inference overhead proportional to
loop_num - Requires
trust_remote_code=True - Not recommended for production use
Relation to SAGEA Model Families
SAGE-OSS-40B is not part of any active SAGEA product family. It is an earlier research artifact that informed how IR and MCH were developed and subsequently integrated into SAGE Actus 2.4 and SAGE Celer 2.6.
Current SAGEA families:
- SAGE Celer โ general-purpose (low/mid/high)
- SAGE Actus โ agentic and domain-specialized
Citation
If you use IR-related work from this release, please cite the foundational paper:
@misc{sagea2025thinking,
title={Thinking About Thinking: SAGE-nano's Inverse Reasoning
for Self-Aware Language Models},
author={Basab Jha and Firoj Paudel and Ujjwal Puri and Zhang Yuting
and Choi Donghyuk and Wang Junhao},
year={2025},
url={https://arxiv.org/abs/2507.00092}
}
@misc{sagea2025sageoss,
title={SAGE-OSS-40B: Open-Source LoopCoder Reasoning Research Model},
author={SAGEA},
year={2025},
url={https://huggingface.co/sagea-ai/sage-oss-40b}
}
About SAGEA
SAGEA is an AI research company based in Nepal, building foundation models and AI infrastructure for South Asia and beyond. ```
- Downloads last month
- 16
