Astria Logo

Astria

Astria is a next-generation, fully local multimodal foundation model built on top of a Ministral-based language backbone and a custom vision encoder. This architecture significantly improves visual grounding, multilingual reasoning, and agentic reliability while remaining efficient enough for edge deployment.


πŸš€ Astria Update Highlights

Me7war’s latest Astria update pushes the limits of small-scale multimodal AI, combining efficiency, reasoning, and vision capabilities:

Key Features

  • Vision Mastery: Custom encoder enables deep image understanding and precise visual–text alignment.
  • Multilingual Support: Handles dozens of languagesβ€”English, French, Spanish, German, Italian, Portuguese, Dutch, Arabic, Chinese, Japanese, Koreanβ€”while maintaining strong reasoning and generation.
  • Agent-Ready: Native function calls, reliable JSON outputs, and strict prompt adherence make Astria fully agentic-capable.
  • Edge Efficiency: Optimized for minimal hardware without sacrificing performance.
  • Large Context Window: Up to 256k tokens for long-form reasoning, document-level comprehension, and complex multi-step tasks.
  • Enhanced Reasoning: Ministral backbone ensures stronger factual grounding, smoother multimodal alignment, and improved long-horizon reasoning.

Astria Benchmark

A fully local, compact model redefining what edge-deployable multimodal AI can achieve.


πŸ“Š Visual Reasoning Performance

Astria Performance

Astria applies a custom evaluation using GPT-5 PRO as the judge.

92.53% β€” New SOTA

LLaVA baseline: 90.92%

A custom evaluation on 30 unseen images with 3 instruction types per image (conversation, description, complex reasoning) shows Astria outperforms GPT-5 in all categories.

Evaluation: Astria vs GPT-5

Astria Evaluation

A custom evaluation set of 30 unseen images was constructed. Each image includes three instruction types:

  1. Conversational understanding
  2. Detailed visual description
  3. Complex multimodal reasoning

This yields 90 unique image–language tasks, evaluated on:

  • Astria
  • GPT-5

Scoring was performed by GPT-5 PRO, using a 1–10 scale per task.

Results

Astria outperforms GPT-5 across all instruction categories, validating the effectiveness of the custom vision encoder combined with the Ministral knowledge-enhanced language model.


Model Summary

  • Vision Encoder: Custom-built, with precise visual-text alignment
  • Language Backbone: Ministral-based, optimized for reasoning and factual accuracy
  • Training: End-to-end multimodal alignment with knowledge supervision
  • Output: Grounded, structured, and context-aware responses
  • Deployment: Fully local and edge-optimized, supporting up to 256k token context

License

Astria is released under the Astria License for personal and non-commercial use. Commercial use requires explicit permission from the creator.

Downloads last month
6
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support