Projects

Repo Reading Notes for OpenPI

After reading the paper: π0: A Vision-Language-Action Flow Model for General Robot Control , I decided to spend a few days walking through the official implementation, openpi , to understand how everything work in practice. There are several questions I want to find answer. On the big side: how does this repo turn VLM features into robot actions, and how are training and inference actually wired together? On the smaller side: how is the two-expert MoE implemented, and how do observations influence the final action output? ...

SINDy Implementation Notes

Github repo: https://github.com/mrtanke/SINDy This blog is basically my hands-on notes while implementing SINDy (Sparse Identification of Nonlinear Dynamics) as a small, understandable pipeline: generate data → build a candidate library → solve a sparse regression problem → sanity-check the discovered equation → then push it into harder settings like autoencoder and video-like data. The whole notebook is organized into three parts: (1) SINDy on ground-truth coordinates, (2) SINDy-Autoencoder, and (3) a bonus on high-dimensional “video” inputs. ...

Distg Series on UrbanLF

I have finished reproducing the Distg series (DistgSSR / DistgASR / DistgDisp). The next task was to apply these methods or we can say models to a new dataset: UrbanLF , and evaluate how well they can perform in this dataset. The task can be summarized as: Apply the Distg series (DistgSSR / DistgASR / DistgDisp) to a new light-field dataset (UrbanLF), obtain results, and compare them with the results from a colleague’s method. ...

Reproducing Robotics Transformer 1

By the end of the Christmas holidays, I continued my VLA (Vision–Language–Action) learning track. I carefully read two papers: RT-1: Robotics Transformer for Real-World Control at Scale (Brohan et al., 2022 ) and RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Brohan et al., 2023 ) while writing my reading notes here: https://mrtanke.github.io/posts/2026-01-09-rt-series/ . After finishing the notes, I decided to reproduce Robotics Transformer 1 (RT-1) in PyTorch, not to build a production system, but to truly understand the design decisions and implement the core ideas from the paper end-to-end. The goal is a learning-oriented, minimal implementation that stays close to the RT-1 architecture, while keeping the codebase clean and readable. Since training RT-1 at scale requires a heavy TFDS/RLDS pipeline and large real-robot datasets, I intentionally kept the data side minimal: I use a synthetic dataset that mirrors RT-1’s input and output shapes to validate the model forward pass, action tokenization, and the behavioral cloning training loop. ...

Reproducing Diffusion Policy

At the end of 2025, I spent a few days reproducing Diffusion Policy from Diffusion Policy: Visuomotor Policy Learning via Action Diffusion . I first spent about one day to go through the paper. If you are interested, feel free to check paper reading note . The work is impressive, so I decided to reproduce it over the Christmas break. This is the repo address https://github.com/mrtanke/diffusion-policy . Repo skeleton diffusion-policy/ ├── diffusion_policy/ # Library code (importable package) │ ├── __init__.py # Package marker │ ├── checkpoint.py # Save/load checkpoints │ ├── normalizer.py # Min-max normalization to/from [-1, 1] │ ├── data/ │ │ ├── pusht_zarr_dataset.py # Load PushT replay data and return training samples: observation history + future action trajectory │ │ └── sequence_utils.py # Builds the start/end indices for each fixed-length training sample/window within episode │ └── models/ │ ├── diffusion.py # DiffusionPolicy training and sampling wrapper │ ├── denoisers.py # Temporal UNet denoiser / noise predictor │ └── encoders.py # Observation encoder ├── train.py # Main training entrypoint ├── eval_pusht.py # Eval script for PushT └── data/pusht/ # Local dataset folder (pusht_cchi_v7_replay.zarr/) Core algorithm We want to generate an expert action trajectory by denoising a noise action trajectory, just like what Image diffusion do. To do this, we train a model to predicted the noise contained in each action from a noise action tractory. Then we use the predicted noise to gradually denoise the noise action trajectory. ...