Synthesizer: Rethinking Self-Attention for Transformer Models

Paper-reading notes: Synthesizer
December 16, 2025 | 244 words | Author: Tan Ke

Reformer: The Efficient Transformer

Paper-reading notes: Reformer
December 14, 2025 | 287 words | Author: Tan Ke

FNet: Mixing Tokens with Fourier Transforms

Paper-reading notes: FNet
December 5, 2025 | 470 words | Author: Tan Ke

Linformer: Self-Attention with Linear Complexity

Paper-reading notes: Linformer
December 4, 2025 | 236 words | Author: Tan Ke

Rethinking Attention with Performers

Paper-reading notes: Performers
December 3, 2025 | 499 words | Author: Tan Ke

ATLAS: Learning to Optimally Memorize the Context at Test Time

Paper-reading notes: ATLAS
November 29, 2025 | 628 words | Author: Tan Ke

Titans: Learning to Memorize at Test Time

Paper-reading notes: Titans
November 26, 2025 | 916 words | Author: Tan Ke

Roformer: Enhanced Transformer With Rotary Position Embedding

Paper-reading notes: Roformer
November 25, 2025 | 348 words | Author: Tan Ke

Hyena Hierarchy: Towards Larger Convolutional Language Models

Paper-reading notes: Hyena Hierarchy
November 18, 2025 | 516 words | Author: Tan Ke

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper-reading notes: Mamba
November 17, 2025 | 397 words | Author: Tan Ke