this

SAGE-OSS-40B

SAGE-OSS-40B is an open-source research release from SAGEA โ€” a 40B Mixture-of-Experts model fine-tuned and structurally extended from an open-source MoE base. It serves as an early testbed for two SAGEA research directions: the LoopCoder iterative inference mechanism and Inverse Reasoning (IR), the latter of which is formally described in our paper Thinking About Thinking: SAGE-nano's Inverse Reasoning for Self-Aware Language Models.

SAGE-OSS-40B also incorporates Multi-Context Heads (MCH), an attention-level modification developed alongside IR that allows the model to maintain independent context streams within a single forward pass.

This is not a production model. It predates the formalized IR training pipeline used in SAGE Actus 2.4 and Celer 2.6, and is released for transparency and community research.


Model Details

Property Value
Architecture SAGELoopCoder (MoE, fine-tuned)
Parameters ~40B
Tensor Type BF16
Context Length 131,072 tokens
Vocab Size 76,800
Hidden Size 5,120
Layers 80
Attention Heads 40 (GQA: 8 KV heads)
Loop Iterations 2
Loop Window Size 64
RoPE Theta 500,000
License Apache 2.0

Architecture

LoopCoder

Rather than a single linear forward pass, LoopCoder performs loop_num iterative passes over a sliding window of loop_window_size tokens, allowing the model to refine intermediate representations before committing to output. This is SAGEA's approach to building reasoning depth into the forward pass itself, without relying on external scaffolding or prompting techniques.

  • loop_num: 2 โ€” two iterative passes per generation step
  • loop_window_size: 64 โ€” sliding token window for loop computation

Inverse Reasoning (IR)

IR is a metacognitive mechanism, first introduced in SAGE-nano (arXiv:2507.00092), that reflects back through the model's attention processes post-generation to identify key decision points in a reasoning chain and surface explanations for why specific paths were taken. In SAGE-OSS-40B, IR is implemented at the attention level โ€” the same formulation used in SAGE Actus 2.4 โ€” rather than the deeper pre-training integration it received in Celer 2.6.

Multi-Context Heads (MCH)

MCH is an attention-level modification developed alongside IR at SAGEA. A subset of attention heads are designated to maintain independent context representations in parallel during the forward pass, allowing the model to implicitly track divergent reasoning branches without requiring explicit beam search or sampling strategies. MCH was developed as a complement to IR: IR explains which path was taken; MCH helps the model hold multiple candidates before that decision is made.


Additional Properties

  • GQA โ€” 40 attention heads, 8 KV heads for inference efficiency
  • SiLU activations, RMS norm, no attention or MLP bias
  • RoPE theta 500,000 for long-context stability

Usage

This model requires trust_remote_code=True due to the custom LoopCoder architecture class.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "sagea-ai/sage-oss-40b"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Explain the concept of recursion in programming."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        eos_token_id=[2, 75864, 75869]
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: Standard pipeline inference will not work due to the custom SAGELoopCoderForCausalLM class. Use the snippet above directly.


Limitations

  • Research release โ€” not instruction-tuned or RLHF aligned
  • IR is implemented at the attention level; the deeper pre-training integration from Celer 2.6 is not present here
  • No public benchmark results; evaluated informally during development
  • LoopCoder adds inference overhead proportional to loop_num
  • Requires trust_remote_code=True
  • Not recommended for production use

Relation to SAGEA Model Families

SAGE-OSS-40B is not part of any active SAGEA product family. It is an earlier research artifact that informed how IR and MCH were developed and subsequently integrated into SAGE Actus 2.4 and SAGE Celer 2.6.

Current SAGEA families:

  • SAGE Celer โ€” general-purpose (low/mid/high)
  • SAGE Actus โ€” agentic and domain-specialized

Citation

If you use IR-related work from this release, please cite the foundational paper:

@misc{sagea2025thinking,
  title={Thinking About Thinking: SAGE-nano's Inverse Reasoning 
         for Self-Aware Language Models},
  author={Basab Jha and Firoj Paudel and Ujjwal Puri and Zhang Yuting 
          and Choi Donghyuk and Wang Junhao},
  year={2025},
  url={https://arxiv.org/abs/2507.00092}
}

@misc{sagea2025sageoss,
  title={SAGE-OSS-40B: Open-Source LoopCoder Reasoning Research Model},
  author={SAGEA},
  year={2025},
  url={https://huggingface.co/sagea-ai/sage-oss-40b}
}

About SAGEA

SAGEA is an AI research company based in Nepal, building foundation models and AI infrastructure for South Asia and beyond. ```

Downloads last month
16
Safetensors
Model size
40B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Paper for sagea-ai/sage-oss-40b