SAGE-OSS-40B

SAGE-OSS-40B is an open-source research release from SAGEA — a 40B Mixture-of-Experts model fine-tuned and structurally extended from an open-source MoE base. It serves as an early testbed for two SAGEA research directions: the LoopCoder iterative inference mechanism and Inverse Reasoning (IR), the latter of which is formally described in our paper Thinking About Thinking: SAGE-nano's Inverse Reasoning for Self-Aware Language Models.

SAGE-OSS-40B also incorporates Multi-Context Heads (MCH), an attention-level modification developed alongside IR that allows the model to maintain independent context streams within a single forward pass.

This is not a production model. It predates the formalized IR training pipeline used in SAGE Actus 2.4 and Celer 2.6, and is released for transparency and community research.

Model Details

Property	Value
Architecture	SAGELoopCoder (MoE, fine-tuned)
Parameters	~40B
Tensor Type	BF16
Context Length	131,072 tokens
Vocab Size	76,800
Hidden Size	5,120
Layers	80
Attention Heads	40 (GQA: 8 KV heads)
Loop Iterations	2
Loop Window Size	64
RoPE Theta	500,000
License	Apache 2.0

Architecture

LoopCoder

Rather than a single linear forward pass, LoopCoder performs loop_num iterative passes over a sliding window of loop_window_size tokens, allowing the model to refine intermediate representations before committing to output. This is SAGEA's approach to building reasoning depth into the forward pass itself, without relying on external scaffolding or prompting techniques.

loop_num: 2 — two iterative passes per generation step
loop_window_size: 64 — sliding token window for loop computation

Inverse Reasoning (IR)

IR is a metacognitive mechanism, first introduced in SAGE-nano (arXiv:2507.00092), that reflects back through the model's attention processes post-generation to identify key decision points in a reasoning chain and surface explanations for why specific paths were taken. In SAGE-OSS-40B, IR is implemented at the attention level — the same formulation used in SAGE Actus 2.4 — rather than the deeper pre-training integration it received in Celer 2.6.

Multi-Context Heads (MCH)

MCH is an attention-level modification developed alongside IR at SAGEA. A subset of attention heads are designated to maintain independent context representations in parallel during the forward pass, allowing the model to implicitly track divergent reasoning branches without requiring explicit beam search or sampling strategies. MCH was developed as a complement to IR: IR explains which path was taken; MCH helps the model hold multiple candidates before that decision is made.

Additional Properties

GQA — 40 attention heads, 8 KV heads for inference efficiency
SiLU activations, RMS norm, no attention or MLP bias
RoPE theta 500,000 for long-context stability

Usage

This model requires trust_remote_code=True due to the custom LoopCoder architecture class.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "sagea-ai/sage-oss-40b"

tokenizer = AutoTokenizer.from_pretrained(
    model_id,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "Explain the concept of recursion in programming."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        eos_token_id=[2, 75864, 75869]
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Note: Standard pipeline inference will not work due to the custom SAGELoopCoderForCausalLM class. Use the snippet above directly.

Limitations

Research release — not instruction-tuned or RLHF aligned
IR is implemented at the attention level; the deeper pre-training integration from Celer 2.6 is not present here
No public benchmark results; evaluated informally during development
LoopCoder adds inference overhead proportional to loop_num
Requires trust_remote_code=True
Not recommended for production use

Relation to SAGEA Model Families

SAGE-OSS-40B is not part of any active SAGEA product family. It is an earlier research artifact that informed how IR and MCH were developed and subsequently integrated into SAGE Actus 2.4 and SAGE Celer 2.6.

Current SAGEA families:

SAGE Celer — general-purpose (low/mid/high)
SAGE Actus — agentic and domain-specialized

Citation

If you use IR-related work from this release, please cite the foundational paper:

@misc{sagea2025thinking,
  title={Thinking About Thinking: SAGE-nano's Inverse Reasoning 
         for Self-Aware Language Models},
  author={Basab Jha and Firoj Paudel and Ujjwal Puri and Zhang Yuting 
          and Choi Donghyuk and Wang Junhao},
  year={2025},
  url={https://arxiv.org/abs/2507.00092}
}

@misc{sagea2025sageoss,
  title={SAGE-OSS-40B: Open-Source LoopCoder Reasoning Research Model},
  author={SAGEA},
  year={2025},
  url={https://huggingface.co/sagea-ai/sage-oss-40b}
}

About SAGEA

SAGEA is an AI research company based in Nepal, building foundation models and AI infrastructure for South Asia and beyond. ```

Downloads last month: 16

Safetensors

Model size

40B params

Tensor type

BF16

Paper for sagea-ai/sage-oss-40b

Thinking About Thinking: SAGE-nano's Inverse Reasoning for Self-Aware Language Models

Paper • 2507.00092 • Published Jun 30, 2025