https://www.goodfire.ai/research/stochastic-param-decomp#
- Decomposition: Breaking down the model into simpler parts
- Description of components (interpretation): Formulating hypotheses about the functional role of component parts and how they interact
- Validation of descriptions: Testing if our hypotheses are correct (adapted from Open Problems in Mechanistic Interpretability).
”You’re getting a lot about the structure of the dataset, and not so much the computations” about sparse dictionary learning(sae, clt)