stereoplegic 's Collections
Trellis Networks for Sequence Modeling
Paper
• 1810.06682
• Published
• 1
ProSG: Using Prompt Synthetic Gradients to Alleviate Prompt Forgetting
of RNN-like Language Models
Paper
• 2311.01981
• Published
• 1
Gated recurrent neural networks discover attention
Paper
• 2309.01775
• Published
• 10
Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
Paper
• 2305.19190
• Published
• 1
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Paper
• 2312.00752
• Published
• 150
On the Universality of Linear Recurrences Followed by Nonlinear
Projections
Paper
• 2307.11888
• Published
• 1
Laughing Hyena Distillery: Extracting Compact Recurrences From
Convolutions
Paper
• 2310.18780
• Published
• 3
Cached Transformers: Improving Transformers with Differentiable Memory
Cache
Paper
• 2312.12742
• Published
• 13
RNNs of RNNs: Recursive Construction of Stable Assemblies of Recurrent
Neural Networks
Paper
• 2106.08928
• Published
• 1
StableSSM: Alleviating the Curse of Memory in State-space Models through
Stable Reparameterization
Paper
• 2311.14495
• Published
• 1
Hierarchically Gated Recurrent Neural Network for Sequence Modeling
Paper
• 2311.04823
• Published
• 2
Enhancing Transformer RNNs with Multiple Temporal Perspectives
Paper
• 2402.02625
• Published
Griffin: Mixing Gated Linear Recurrences with Local Attention for
Efficient Language Models
Paper
• 2402.19427
• Published
• 56
Improving Token-Based World Models with Parallel Observation Prediction
Paper
• 2402.05643
• Published
• 1
Diffusion-RWKV: Scaling RWKV-Like Architectures for Diffusion Models
Paper
• 2404.04478
• Published
• 13
HGRN2: Gated Linear RNNs with State Expansion
Paper
• 2404.07904
• Published
• 20
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill
and Extreme KV-Cache Compression
Paper
• 2407.12077
• Published
• 57
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published
• 153