anthropic
- Circuit Tracing- Revealing Computational Graphs in Language Models
- On the Biology of a Large Language Model
goodfire
david bau
- Locating and Editing Factual Associations in GPT / ROME
- MEMIT: Mass Editing Memory in a Transformer
- Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
geiger
- Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability