https://www.goodfire.ai/research/stochastic-param-decomp#

  1. Decomposition: Breaking down the model into simpler parts
  2. Description of components (interpretation): Formulating hypotheses about the functional role of component parts and how they interact
  3. Validation of descriptions: Testing if our hypotheses are correct (adapted from Open Problems in Mechanistic Interpretability).

”You’re getting a lot about the structure of the dataset, and not so much the computations” about sparse dictionary learning(sae, clt)