Alph grad and cross country elimination
Linear attention and it’s problems.
Low context length Local loss functions are optimized during inference Metaplasticity ganuly and zenke Metaplasticity eith synaptic uncertainity.
Posterior weights with truncation. Palimpsa? Not better than SOTA
Mesa optimization - in context learning - loss minimization https://arxiv.org/html/2309.05858v2