Archive

2026 ¹⁴

March ¹

π Series (π₀, π₀.₅)

March 1, 2026 | 2621 words | Author: Tan Ke

February ⁴

Repo Reading Notes for OpenPI

February 28, 2026 | 1127 words | Author: Tan Ke

GPU and CUDA

February 22, 2026 | 2607 words | Author: Tan Ke

Optimization in Machine Learning

February 10, 2026 | 2443 words | Author: Tan Ke

BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search

February 5, 2026 | 329 words | Author: Tan Ke

January ⁹

SINDy Implementation Notes

January 22, 2026 | 892 words | Author: Tan Ke

Distg Series on UrbanLF

January 20, 2026 | 1406 words | Author: Tan Ke

UrbanLF: A Comprehensive Light Field Dataset for Semantic Segmentation of Urban Scenes

January 17, 2026 | 432 words | Author: Tan Ke

Large Concept Models: Language Modeling in a Sentence Representation Space

January 15, 2026 | 3217 words | Author: Tan Ke

From Tokens To Thoughts: How LLMs And Humans Trade Compression For Meaning

January 12, 2026 | 913 words | Author: Tan Ke

Reproducing Robotics Transformer 1

January 10, 2026 | 2398 words | Author: Tan Ke

RT Series (RT-1, RT-2)

January 9, 2026 | 1740 words | Author: Tan Ke

Reproducing Diffusion Policy

January 2, 2026 | 2446 words | Author: Tan Ke

Learning Transferable Visual Models From Natural Language Supervision

January 1, 2026 | 888 words | Author: Tan Ke

2025 ⁵⁰

December ¹⁶

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

December 28, 2025 | 672 words | Author: Tan Ke

Synthesizer: Rethinking Self-Attention for Transformer Models

December 16, 2025 | 244 words | Author: Tan Ke

Learning Transformer Programs

December 15, 2025 | 339 words | Author: Tan Ke

Reformer: The Efficient Transformer

December 14, 2025 | 287 words | Author: Tan Ke

OpenVLA: An Open-Source Vision-Language-Action Model

December 12, 2025 | 312 words | Author: Tan Ke

Multiobjective Tree-Structured Parzen Estimator

December 11, 2025 | 511 words | Author: Tan Ke

Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning

December 10, 2025 | 864 words | Author: Tan Ke

Random Search for Hyper-Parameter Optimization

December 10, 2025 | 774 words | Author: Tan Ke

ALTA: Compiler-Based Analysis of Transformers

December 9, 2025 | 720 words | Author: Tan Ke

Tracr: Compiled Transformers as a Laboratory for Interpretability

December 8, 2025 | 59 words | Author: Tan Ke

Thinking Like Transformers

December 7, 2025 | 273 words | Author: Tan Ke

It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

December 6, 2025 | 923 words | Author: Tan Ke

FNet: Mixing Tokens with Fourier Transforms

December 5, 2025 | 470 words | Author: Tan Ke

Linformer: Self-Attention with Linear Complexity

December 4, 2025 | 236 words | Author: Tan Ke

Rethinking Attention with Performers

December 3, 2025 | 499 words | Author: Tan Ke

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

December 1, 2025 | 462 words | Author: Tan Ke

November ²²

What Formal Languages Can Transformers Express? A Survey

November 30, 2025 | 327 words | Author: Tan Ke

ATLAS: Learning to Optimally Memorize the Context at Test Time

November 29, 2025 | 628 words | Author: Tan Ke

Solving olympiad geometry without human demonstrations

November 28, 2025 | 522 words | Author: Tan Ke

Formal Mathematical Reasoning A New Frontier in AI

November 27, 2025 | 347 words | Author: Tan Ke

Titans: Learning to Memorize at Test Time

November 26, 2025 | 916 words | Author: Tan Ke

Roformer: Enhanced Transformer With Rotary Position Embedding

November 25, 2025 | 348 words | Author: Tan Ke

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

November 24, 2025 | 360 words | Author: Tan Ke

Mastering the game of Go without human knowledge

November 24, 2025 | 342 words | Author: Tan Ke

Disentangling Light Fields for Super-Resolution and Disparity Estimation

November 19, 2025 | 1379 words | Author: Tan Ke

Hyena Hierarchy: Towards Larger Convolutional Language Models

November 18, 2025 | 516 words | Author: Tan Ke

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

November 17, 2025 | 397 words | Author: Tan Ke

A survey for light field super-resolution

November 14, 2025 | 341 words | Author: Tan Ke

Efficiently Modeling Long Sequences with Structured State Spaces

November 11, 2025 | 930 words | Author: Tan Ke

Retentive Network: A Successor to Transformer for Large Language Models

November 11, 2025 | 472 words | Author: Tan Ke

Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution

November 10, 2025 | 1071 words | Author: Tan Ke

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

November 9, 2025 | 1312 words | Author: Tan Ke

Reference-Based Face Super-Resolution Using the Spatial Transformer

November 7, 2025 | 428 words | Author: Tan Ke

LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution

November 7, 2025 | 1157 words | Author: Tan Ke

Latent Diffusion Models

November 6, 2025 | 964 words | Author: Tan Ke

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

November 4, 2025 | 2299 words | Author: Tan Ke

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

November 3, 2025 | 1851 words | Author: Tan Ke

A Tutorial on Bayesian Optimization

November 1, 2025 | 3591 words | Author: Tan Ke

October ¹²

CrossNet++: Cross-Scale Large-Parallax Warping for Reference-Based Super-Resolution

October 29, 2025 | 1433 words | Author: Tan Ke

xLSTM: Extended Long Short-Term Memory

October 28, 2025 | 1394 words | Author: Tan Ke

RWKV: Reinventing RNNs for the Transformer Era

October 27, 2025 | 1499 words | Author: Tan Ke

Mastering the game of Go with MCTS and Deep Neural Networks

October 24, 2025 | 2246 words | Author: Tan Ke

CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping

October 21, 2025 | 1976 words | Author: Tan Ke

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

October 20, 2025 | 314 words | Author: Tan Ke

Learning‑based light field imaging

October 20, 2025 | 6550 words | Author: Tan Ke

A Survey of RAG

October 19, 2025 | 8671 words | Author: Tan Ke

From Local to Global: A GraphRAG Approach to Query-Focused Summarization

October 16, 2025 | 588 words | Author: Tan Ke

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

October 15, 2025 | 2177 words | Author: Tan Ke

A Bridging Model for Parallel Computation

October 10, 2025 | 201 words | Author: Tan Ke

Attention is All You Need

October 1, 2025 | 1268 words | Author: Tan Ke

2026 14

March 1