causality, prediction in neuro, ai
Notes
-
Gershman’s article: causality is invariant prediction. Neuroscience going toward prediction might not give understanding. There are similar examples in interpretability.
-
Microprocessor
-
Shortcut learning
-
Differences / similarities in RNNs due to architectures
-
Hypothesis testing interpretability
-
Physicalism, computational complexity
-
Interpretability illusions
-
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable? (Méloux et al., arXiv 2502.20914)read
-
Transcoders find interpretable LLM feature circuits (Dunefsky et al., NeurIPS 2024)read
-
The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms (Davies et al., arXiv 2408.05859)read
-
Towards Automated Circuit Discovery for Mechanistic Interpretability (Conmy et al., NeurIPS 2023)readdigest
-
Keeping-eye-still mechanism
-
Lindsay and Bau article about neural system understanding