Synthesizer: Rethinking Self-Attention for Transformer Models

Paper-reading notes: Synthesizer
December 16, 2025 | 244 words | Author: Tan Ke

Learning Transformer Programs

Paper-reading notes: Learning Transformer Programs
December 15, 2025 | 339 words | Author: Tan Ke

Reformer: The Efficient Transformer

Paper-reading notes: Reformer
December 14, 2025 | 287 words | Author: Tan Ke

OpenVLA: An Open-Source Vision-Language-Action Model

Paper-reading notes: OpenVLA
December 12, 2025 | 312 words | Author: Tan Ke

Multiobjective Tree-Structured Parzen Estimator

Paper-reading notes: MOTPE
December 11, 2025 | 511 words | Author: Tan Ke

Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning

Paper-reading notes: Bayesian Optimization
December 10, 2025 | 864 words | Author: Tan Ke

Random Search for Hyper-Parameter Optimization

Paper-reading notes: Random Search for Hyper-Parameter Optimization
December 10, 2025 | 774 words | Author: Tan Ke

ALTA: Compiler-Based Analysis of Transformers

Paper-reading notes: ALTA
December 9, 2025 | 720 words | Author: Tan Ke

Tracr: Compiled Transformers as a Laboratory for Interpretability

Paper-reading notes: Tracr
December 8, 2025 | 59 words | Author: Tan Ke

Thinking Like Transformers

Paper-reading notes: RASP
December 7, 2025 | 273 words | Author: Tan Ke