Synthesizer: Rethinking Self-Attention for Transformer Models

Paper-reading notes: Synthesizer
December 16, 2025 · 244 words

Learning Transformer Programs

Paper-reading notes: Learning Transformer Programs
December 15, 2025 · 339 words

Reformer: The Efficient Transformer

Paper-reading notes: Reformer
December 14, 2025 · 287 words

ALTA: Compiler-Based Analysis of Transformers

Paper-reading notes: ALTA
December 9, 2025 · 720 words

Tracr: Compiled Transformers as a Laboratory for Interpretability

Paper-reading notes: Tracr
December 8, 2025 · 59 words

Thinking Like Transformers

Paper-reading notes: RASP
December 7, 2025 · 273 words

FNet: Mixing Tokens with Fourier Transforms

Paper-reading notes: FNet
December 5, 2025 · 470 words

Linformer: Self-Attention with Linear Complexity

Paper-reading notes: Linformer
December 4, 2025 · 236 words

Rethinking Attention with Performers

Paper-reading notes: Performers
December 3, 2025 · 499 words

What Formal Languages Can Transformers Express? A Survey

Paper-reading notes: What Formal Languages Can Transformers Express? A Survey
November 30, 2025 · 327 words