Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Paper • 2605.21803 • Published • 4
None defined yet.
Same Architecture, Different Capacity: Optimizer-Induced Spectral Scaling Laws
Spectral Scaling Laws in Language Models: How Effectively Do Feed-Forward Networks Use Their Latent Space?