Stochastic Parameter Decomposition

Decomposition: Breaking down the model into simpler parts
Description of components (interpretation): Formulating hypotheses about the functional role of component parts and how they interact
Validation of descriptions: Testing if our hypotheses are correct (adapted from Open Problems in Mechanistic Interpretability).

”You’re getting a lot about the structure of the dataset, and not so much the computations” about sparse dictionary learning(sae, clt)

Rakaar's Notes