Welcome to the first workshop on Interpreting Cognition in Deep Learning Models, which will take place at NeurIPS 2025 in San Diego, USA on December 6 or 7, 2025.

About the workshop

Recent innovations in deep learning have produced models with impressive capabilities, achieving or even exceeding human performance in a wide range of domains. A timely and critical challenge in AI is understanding what behaviors these models are actually capable of, and the internal processes which support these behaviors. As interest continues to grow in models’ internal processes, the field of cognitive science is becoming increasingly useful for describing and understanding cognition in deep learning models: cognitive science, which seeks to describe the cognitive processes in human and animal minds, offers a rich body of theories, experiments, and frameworks which may be adopted to understand how deep learning models achieve complex behaviors in domains such as language, vision, and reasoning.

The workshop will focus on Cognitive Interpretability (“CogInterp”), which involves the systematic interpretation of high-level cognition in deep learning models. Similar to how cognitive science describes the intermediate representations and algorithms (or cognition) between behavior and neurons in biological systems, the goal of Cognitive Interpretability is to describe the cognitive processes which lie between the levels of behavioral evaluations and mechanistic interpretability in deep learning models. Practically speaking, this means that Cognitive Interpretability does not just ask whether a model can perform task X or has a certain ability Y , but additionally (or instead) how a model performs X or learns and implements Y . These kinds of inferences—from observable behavior to latent “mental” processes—are the bread and butter of cognitive science, but many of the theoretical and empirical tools developed to tackle these problems have not yet been widely adopted in AI research, in part because of the separation between the fields and communities.

To address the gap above, our goal is to bring together researchers in cognitive science and AI interpretability to discuss new empirical results and theories about the inner workings of deep learning models. We hope to gather perspectives from various disciplines, including machine learning, psychology, linguistics, vision science, neuroscience, philosophy of mind, and law.

Important Dates

Specific dates are estimates and subject to change.


Contact us at coginterp@gmail.com.