Paper-reading notes: RetNet
Paper-reading notes: Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation
Paper-reading notes: xLSTM Extended Long Short-Term Memory
Paper-reading notes: RWKV: Reinventing RNNs for the Transformer Era