Rethinking Attention with Performers

Paper-reading notes: Performers
December 3, 2025 | 499 words | Author: Tan Ke

What Formal Languages Can Transformers Express? A Survey

Paper-reading notes: What Formal Languages Can Transformers Express? A Survey
November 30, 2025 | 327 words | Author: Tan Ke

ATLAS: Learning to Optimally Memorize the Context at Test Time

Paper-reading notes: ATLAS
November 29, 2025 | 628 words | Author: Tan Ke

Roformer: Enhanced Transformer With Rotary Position Embedding

Paper-reading notes: Roformer
November 25, 2025 | 348 words | Author: Tan Ke

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Paper-reading notes: ViT
November 3, 2025 | 1851 words | Author: Tan Ke

A Bridging Model for Parallel Computation

Paper-reading notes: A Bridging Model for Parallel Computation
October 10, 2025 | 201 words | Author: Tan Ke

Attention is All You Need

Paper-reading notes: Attention is All You Need
October 1, 2025 | 1268 words | Author: Tan Ke