Efficiently Modeling Long Sequences with Structured State Spaces

Paper-reading notes: S4
November 11, 2025 | 930 words | Author: Tan Ke

Retentive Network: A Successor to Transformer for Large Language Models

Paper-reading notes: RetNet
November 11, 2025 | 472 words | Author: Tan Ke

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

Paper-reading notes: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
November 9, 2025 | 1312 words | Author: Tan Ke

xLSTM: Extended Long Short-Term Memory

Paper-reading notes: xLSTM Extended Long Short-Term Memory
October 28, 2025 | 1394 words | Author: Tan Ke

RWKV: Reinventing RNNs for the Transformer Era

Paper-reading notes: RWKV: Reinventing RNNs for the Transformer Era
October 27, 2025 | 1499 words | Author: Tan Ke