stereoplegic 's Collections FFN/MLP
updated
Scaling MLPs: A Tale of Inductive Bias
Paper
• 2306.13575
• Published
• 17
Trap of Feature Diversity in the Learning of MLPs
Paper
• 2112.00980
• Published
• 2
Understanding the Spectral Bias of Coordinate Based MLPs Via Training
Dynamics
Paper
• 2301.05816
• Published
• 1
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
Locality?
Paper
• 2108.04384
• Published
• 1
MetaFormer Is Actually What You Need for Vision
Paper
• 2111.11418
• Published
• 1
One Wide Feedforward is All You Need
Paper
• 2309.01826
• Published
• 34
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Paper
• 2310.10837
• Published
• 11
Attention is Not All You Need: Pure Attention Loses Rank Doubly
Exponentially with Depth
Paper
• 2103.03404
• Published
• 1
A technical note on bilinear layers for interpretability
Paper
• 2305.03452
• Published
• 1
Cross-token Modeling with Conditional Computation
Paper
• 2109.02008
• Published
• 1
Efficient Language Modeling with Sparse all-MLP
Paper
• 2203.06850
• Published
• 1
MLP-Mixer as a Wide and Sparse MLP
Paper
• 2306.01470
• Published
• 1
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as
an Alternative to Attention Layers in Transformers
Paper
• 2311.10642
• Published
• 25
Exponentially Faster Language Modelling
Paper
• 2311.10770
• Published
• 119
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Paper
• 2211.04076
• Published
• 1
On the Universality of Linear Recurrences Followed by Nonlinear
Projections
Paper
• 2307.11888
• Published
• 1
HyperMixer: An MLP-based Low Cost Alternative to Transformers
Paper
• 2203.03691
• Published
• 1
NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning
Paper
• 2307.08941
• Published
• 1
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
Network Models
Paper
• 2112.00029
• Published
• 1
Fast Feedforward Networks
Paper
• 2308.14711
• Published
• 3
KAN: Kolmogorov-Arnold Networks
Paper
• 2404.19756
• Published
• 116
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
• 2404.07413
• Published
• 38
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context
Language Modeling
Paper
• 2406.07522
• Published
• 40
Enhancing Fast Feed Forward Networks with Load Balancing and a Master
Leaf Node
Paper
• 2405.16836
• Published