Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning

Paper-reading notes: Bayesian Optimization
December 10, 2025 | 864 words | Author: Tan Ke

Random Search for Hyper-Parameter Optimization

Paper-reading notes: Random Search for Hyper-Parameter Optimization
December 10, 2025 | 774 words | Author: Tan Ke

ALTA: Compiler-Based Analysis of Transformers

Paper-reading notes: ALTA
December 9, 2025 | 720 words | Author: Tan Ke

Tracr: Compiled Transformers as a Laboratory for Interpretability

Paper-reading notes: Tracr
December 8, 2025 | 59 words | Author: Tan Ke

Thinking Like Transformers

Paper-reading notes: RASP
December 7, 2025 | 273 words | Author: Tan Ke

It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

Paper-reading notes: MIRAS
December 6, 2025 | 923 words | Author: Tan Ke

FNet: Mixing Tokens with Fourier Transforms

Paper-reading notes: FNet
December 5, 2025 | 470 words | Author: Tan Ke

Linformer: Self-Attention with Linear Complexity

Paper-reading notes: Linformer
December 4, 2025 | 236 words | Author: Tan Ke

Rethinking Attention with Performers

Paper-reading notes: Performers
December 3, 2025 | 499 words | Author: Tan Ke

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

Paper-reading notes: On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning
December 1, 2025 | 462 words | Author: Tan Ke