CMMC Expert 7B

Notice: These models are provided for proof-of-concept and testing purposes only. Production-grade models are not publicly shared. For inquiries regarding production models or commercial licensing, please contact the maintainer: Nathan Maine.

A locally-hosted, fine-tuned language model specialized in CMMC 2.0, NIST 800-171, NIST 800-53, HIPAA, DFARS, and cybersecurity compliance frameworks.

This is the 7B variant — optimized for fast responses on consumer hardware. Part of a four-model suite (7B, 14B, 32B, 72B) sharing the same compliance knowledge base.

Quick Start (Ollama)

# Download and run
ollama pull Nathan-Maine/cmmc-expert-7b

# Ask a compliance question
ollama run cmmc-expert-7b "What access controls are required for CMMC Level 2?"

# Or use the OpenAI-compatible API
curl http://localhost:11434/api/generate -d '{
  "model": "cmmc-expert-7b",
  "prompt": "What are the key differences between CMMC Level 1 and Level 2?",
  "stream": false
}'

Model Details

Property Value
Base Model Qwen2.5-7B-Instruct
Parameters 7.6 billion
Fine-Tuning Method QLoRA (4-bit base, LoRA rank 64, alpha 128)
Quantization q5_k_m (GGUF)
File Size 5.1 GB
Context Length 32,768 tokens
Inference Speed ~1-2 seconds per response
Training Hardware NVIDIA RTX 5000 Ada (16 GB VRAM)
Training Time ~3.2 hours
Training Framework Unsloth + HuggingFace TRL + PEFT

Security Domain Coverage

Models are fine-tuned for complete security domain coverage, including vulnerability analysis, incident response scenarios, and access control failure modes required for professional SSP and POA&M generation. Behavioral guardrails and policy enforcement are handled at the governed-llm-gateway layer.

Base model migration to Meta Llama 3.1/3.3 (US-origin, open weights) is in progress.

Compliance Framework Coverage

Trained across eight overlapping frameworks to support cross-framework mapping:

Framework Coverage
CMMC 2.0 (32 CFR Part 170) All three levels — 17 L1 practices, 110 L2, 134 L3, assessment methodology
NIST SP 800-171 Rev. 2 110 security requirements across 14 families
NIST SP 800-172 Enhanced security requirements for critical CUI programs
NIST SP 800-53 Rev. 5 Full catalog of 1,189 controls across 20 families
NIST SP 800-37 Risk Management Framework (RMF) steps and authorization
NIST CSF Identify, Protect, Detect, Respond, Recover functions
HIPAA Security Rule Administrative, physical, and technical safeguards
DFARS Clauses 252.204-7012, 7019, 7020, 7021 — contract-level compliance

Training Data

13,434 training + 3,472 validation examples (~3.3M tokens) assembled from 5 curated sources:

Source Examples Share
NIST Cybersecurity (filtered from 424K) 6,372 47.4%
CMMC Full 4,787 35.6%
CMMC Balanced 994 7.4%
HIPAA Compliance 961 7.2%
CMMC Core 320 2.4%

Data processing pipeline:

  1. Format conversion — Raw text → chat-style instruction/response pairs
  2. Quality filtering — Removed entries <100 chars, table-heavy fragments, OCR artifacts
  3. Relevance filtering — NIST data reduced from 424,729 → 72,000 relevant → 7,000 sampled
  4. Deduplication — Exact dedup (xxhash) + near-dedup (MinHash LSH, Jaccard 0.8)
  5. Validation split — 80/20 stratified split maintaining source distribution

Training Configuration

Parameter Value
Epochs 3
Learning Rate 2e-4 (cosine decay)
Optimizer 8-bit AdamW
Batch Size 4 (effective 16 with gradient accumulation)
LoRA Rank 64
LoRA Alpha 128
LoRA Target q_proj, k_proj, v_proj, o_proj
Max Sequence Length 2048
Quantization (Base) 4-bit NF4

Evaluation Results

Training Curve

Checkpoint Progress Eval Loss
Step 200 8% 1.462
Step 600 24% 1.334
Step 1000 40% 1.286
Step 1600 63% 1.253
Step 2400 95% 1.242
Final 100% 1.241

Eval loss improved continuously across all epochs with no overfitting observed.

Example Output

Query: "What access controls are required for CMMC Level 2?"

Response: The model cites specific CMMC practices (AC.L2-3.1.1 through AC.L2-3.1.22), maps them to NIST SP 800-171 requirements, and provides implementation guidance including account management, access enforcement, least privilege, and remote access controls.

Intended Uses

  • SSP Generation — Draft System Security Plan control descriptions with NIST/CMMC citations
  • Gap Analysis — Identify controls required for specific CMMC levels and contract requirements
  • Assessment Prep — Generate evidence checklists and assessment objective narratives
  • Cross-Framework Mapping — Map controls between CMMC, NIST 800-53, HIPAA, and DFARS
  • Policy Drafting — Create policies aligned to specific CMMC practices
  • DFARS Clause Analysis — Identify requirements from contract language
  • Training & Education — Always-available compliance reference for teams

Limitations

  • Not a substitute for qualified compliance professionals. This model is a tool to accelerate compliance work, not replace human judgment.
  • Knowledge cutoff. The model's knowledge is based on training data available at the time of fine-tuning. Always verify against current published frameworks.
  • 7B reasoning depth. For complex multi-framework analysis or detailed gap assessments, consider the 14B, 32B, or 72B variants which provide deeper reasoning capabilities.
  • No retrieval augmentation. The model generates responses from trained knowledge only — it does not search or retrieve external documents at inference time.
  • Citation accuracy. While the model generally cites correct control numbers and framework sections, always verify specific citations against authoritative sources.

Out-of-Scope Uses

  • Legal advice. This model does not provide legal opinions on compliance status.
  • Automated compliance certification. CMMC certification requires human assessors (C3PAOs).
  • Processing actual CUI/ITAR data. The model itself does not process or store sensitive data, but users should follow their organization's data handling policies.

Hardware Requirements

Mode GPU (VRAM) CPU-Only (RAM) Storage
Inference 8 GB 16 GB 10 GB
Training 16 GB N/A 30 GB

Supported OS: Linux, macOS, Windows (WSL2)

The Model Suite

This is the 7B model — the fastest option for day-to-day compliance queries. The full suite includes:

Model Parameters GGUF Size Best For
cmmc-expert-7b 7.6B 5.1 GB Quick lookups, day-to-day queries
cmmc-expert-14b 14.7B ~10 GB Detailed analysis, multi-control reasoning
cmmc-expert-32b 32.5B ~19 GB Deep gap assessments, SSP drafting
cmmc-expert-72b 72.7B ~42 GB Complex multi-framework analysis

Source Code

Full pipeline code, training configuration, and evaluation methodology: github.com/NathanMaine/cmmc-compliance-ai-model

Known Issues

  • Superseded by v2.0 — This version targets only 4 of 7 transformer modules and was trained on a smaller dataset (13,434 examples). v2.0 improves on both fronts with expanded LoRA coverage and 40% more training data. Use v2.0 unless you have a specific reason to use v1.0.
  • Limited cross-framework mapping — May struggle with nuanced mappings between overlapping frameworks (e.g., NIST 800-171 to CMMC practice IDs) compared to later versions.

Citation

@misc{maine2025cmmcexpert,
  title={CMMC Expert: Fine-Tuned Language Models for Cybersecurity Compliance},
  author={Nathan Maine},
  year={2025},
  url={https://github.com/NathanMaine/cmmc-compliance-ai-model}
}

Contact

Downloads last month
35
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Nathan-Maine/cmmc-expert-7b

Base model

Qwen/Qwen2.5-7B
Quantized
(260)
this model

Collection including Nathan-Maine/cmmc-expert-7b

Evaluation results