Abstract
A two-stage trained cybersecurity reasoning model achieves competitive performance on specialized tasks while maintaining general capabilities through supervised fine-tuning and reinforcement learning from verifiable rewards.
We present Foundation-Sec-8B-Reasoning, the first open-source native reasoning model for cybersecurity. Built upon our previously released Foundation-Sec-8B base model (derived from Llama-3.1-8B-Base), the model is trained through a two-stage process combining supervised fine-tuning (SFT) and reinforcement learning from verifiable rewards (RLVR). Our training leverages proprietary reasoning data spanning cybersecurity analysis, instruction-following, and mathematical reasoning. Evaluation across 10 cybersecurity benchmarks and 10 general-purpose benchmarks demonstrates performance competitive with significantly larger models on cybersecurity tasks while maintaining strong general capabilities. The model shows effective generalization on multi-hop reasoning tasks and strong safety performance when deployed with appropriate system prompts and guardrails. This work demonstrates that domain-specialized reasoning models can achieve strong performance on specialized tasks while maintaining broad general capabilities. We release the model publicly at https://huggingface.co/fdtn-ai/Foundation-Sec-8B-Reasoning.
Community
arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/llama-3-1-foundationai-securityllm-reasoning-8b-technical-report-2666-bb112f65
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection (2025)
- TeleAI-Safety: A comprehensive LLM jailbreaking benchmark towards attacks, defenses, and evaluations (2025)
- AprielGuard (2025)
- CyberLLM-FINDS 2025: Instruction-Tuned Fine-tuning of Domain-Specific LLMs with Retrieval-Augmented Generation and Graph Integration for MITRE Evaluation (2026)
- Chain-of-Sanitized-Thoughts: Plugging PII Leakage in CoT of Large Reasoning Models (2026)
- ReasAlign: Reasoning Enhanced Safety Alignment against Prompt Injection Attack (2026)
- Thought-Transfer: Indirect Targeted Poisoning Attacks on Chain-of-Thought Reasoning Models (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper