Efficiently Modeling Long Sequences with Structured State Spaces

Paper-reading notes: S4
November 11, 2025 · 930 words

Retentive Network: A Successor to Transformer for Large Language Models

Paper-reading notes: RetNet
November 11, 2025 · 472 words

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper-reading notes: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
November 9, 2025 · 1312 words

xLSTM: Extended Long Short-Term Memory

Paper-reading notes: xLSTM Extended Long Short-Term Memory
October 28, 2025 · 1394 words

RWKV: Reinventing RNNs for the Transformer Era

Paper-reading notes: RWKV: Reinventing RNNs for the Transformer Era
October 27, 2025 · 1499 words