2026  3

January  3

Reproducing Robotics Transformer 1

January 10, 2026 · 2280 words

RT Series

January 9, 2026 · 1731 words

Reproducing Diffusion Policy

January 2, 2026 · 2442 words

2025  48

December  15

Diffusion Policy: Visuomotor Policy Learning via Action Diffusion

December 28, 2025 · 672 words

Synthesizer: Rethinking Self-Attention for Transformer Models

December 16, 2025 · 244 words

Learning Transformer Programs

December 15, 2025 · 339 words

Reformer: The Efficient Transformer

December 14, 2025 · 287 words

OpenVLA: An Open-Source Vision-Language-Action Model

December 12, 2025 · 312 words

Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning

December 10, 2025 · 864 words

Random Search for Hyper-Parameter Optimization

December 10, 2025 · 774 words

ALTA: Compiler-Based Analysis of Transformers

December 9, 2025 · 720 words

Tracr: Compiled Transformers as a Laboratory for Interpretability

December 8, 2025 · 59 words

Thinking Like Transformers

December 7, 2025 · 273 words

It’s All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization

December 6, 2025 · 923 words

FNet: Mixing Tokens with Fourier Transforms

December 5, 2025 · 470 words

Linformer: Self-Attention with Linear Complexity

December 4, 2025 · 236 words

Rethinking Attention with Performers

December 3, 2025 · 499 words

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

December 1, 2025 · 462 words

November  22

What Formal Languages Can Transformers Express? A Survey

November 30, 2025 · 327 words

ATLAS: Learning to Optimally Memorize the Context at Test Time

November 29, 2025 · 628 words

Solving olympiad geometry without human demonstrations

November 28, 2025 · 522 words

Formal Mathematical Reasoning A New Frontier in AI

November 27, 2025 · 347 words

Titans: Learning to Memorize at Test Time

November 26, 2025 · 916 words

Roformer: Enhanced Transformer With Rotary Position Embedding

November 25, 2025 · 348 words

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

November 24, 2025 · 360 words

Mastering the game of Go without human knowledge

November 24, 2025 · 342 words

Disentangling Light Fields for Super-Resolution and Disparity Estimation

November 19, 2025 · 1379 words

Hyena Hierarchy: Towards Larger Convolutional Language Models

November 18, 2025 · 516 words

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

November 17, 2025 · 397 words

A survey for light field super-resolution

November 14, 2025 · 341 words

Efficiently Modeling Long Sequences with Structured State Spaces

November 11, 2025 · 930 words

Retentive Network: A Successor to Transformer for Large Language Models

November 11, 2025 · 472 words

Exploiting Spatial and Angular Correlations With Deep Efficient Transformers for Light Field Image Super-Resolution

November 10, 2025 · 1071 words

Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation

November 9, 2025 · 1312 words

Reference-Based Face Super-Resolution Using the Spatial Transformer

November 7, 2025 · 428 words

LMR: A Large-Scale Multi-Reference Dataset for Reference-based Super-Resolution

November 7, 2025 · 1157 words

Latent Diffusion Models

November 6, 2025 · 964 words

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

November 4, 2025 · 2299 words

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

November 3, 2025 · 1851 words

A Tutorial on Bayesian Optimization

November 1, 2025 · 3591 words

October  11

CrossNet++: Cross-Scale Large-Parallax Warping for Reference-Based Super-Resolution

October 29, 2025 · 1433 words

xLSTM: Extended Long Short-Term Memory

October 28, 2025 · 1394 words

RWKV: Reinventing RNNs for the Transformer Era

October 27, 2025 · 1499 words

Mastering the game of Go with MCTS and Deep Neural Networks

October 24, 2025 · 2246 words

CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping

October 21, 2025 · 1976 words

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

October 20, 2025 · 314 words

Learning‑based light field imaging

October 20, 2025 · 6550 words

From Local to Global: A GraphRAG Approach to Query-Focused Summarization

October 16, 2025 · 588 words

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

October 15, 2025 · 2177 words

A Bridging Model for Parallel Computation

October 10, 2025 · 201 words

Attention is All You Need

October 1, 2025 · 1268 words