anthropic

goodfire

david bau

  • Locating and Editing Factual Associations in GPT / ROME
  • MEMIT: Mass Editing Memory in a Transformer
    • Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models

geiger

  • Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability

practical review

https://arxiv.org/html/2407.02646v2#S8