Laboratory for Interpretability
Tracr translates RASP programs to transformer weights in six steps:
- Split the RASP program into small steps (a).
- Figure out what each step can output (a).
- Label each step as MLP or Attention (b).
- Arrange them into Transformer layers (c).
- Insert no-op blocks to fill empty spots (c).
- Generate real Transformer weights that implement each step.
