Accepted Papers
Spotlight Talks
"Interpretable Hybrid Neural-Cognitive Models Discover Cognitive Strategies Underlying Flexible Reversal Learning". Chonghao Cai, Liyuan Li, Yifei Cao, Maria K Eckstein.
"Mechanisms of Symbol Processing in Transformers". Paul Smolensky, Roland Fernandez, Zhenghao Zhou, Mattia Opper, Adam Davies, Jianfeng Gao.
"Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression". Nathaniel Imel, Noga Zaslavsky.
"Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs". Sonia Krishna Murthy, Rosie Zhao, Jennifer Hu, Sham M. Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman.
Posters
"Cognitive Behavior Modeling via Activation Steering". Anthony Kuang, Ahmed Ismail, Ayo Akinkugbe, Kevin Zhu, Sean O'Brien.
"Cognitive Load Traces as Symbolic and Visual Accounts of Deep Model Cognition". Dong Liu, Yanxuan Yu.
"Don’t Think of the White Bear: Ironic Negation in Transformer Models under Cognitive Load". Logan Mann, Nayan Saxena, Sarah Tandon, Chenhao Sun, Savar Toteja, Kevin Zhu.
"A Control-Theoretic Account of Cognitive Effort in Language Models". Pranjal Garg.
"On the Role of Pretraining in Domain Adaptation in an Infant-Inspired Distribution Shift Task". Deepayan Sanyal, Joel Phillips Michelson, Maithilee Kunda.
"When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?". Xiaoyun Yin, Elmira Zahmat Doost, Shiwen Zhou, Garima Arya Yadav, Jamie Gorman.
"The Mechanistic Emergence of Symbol Grounding in Language Models". Ziqiao Ma, Shuyu Wu, Xiaoxi Luo, Yidong Huang, Josue Torres-Fonseca, Freda Shi, Joyce Chai.
"Sparse Feature Coactivation Reveals Composable Semantic Modules in Large Language Models". Ruixuan Deng, Xiaoyang Hu, Miles Gilberti, Shane Storks, Aman Taxali, Mike Angstadt, Chandra Sripada, Joyce Chai.
"Tracing the Development of Syntax and Semantics in a Model trained on Child-Directed Speech and Visual Input". Nina Schoener, Mahesh Srinivasan, Colin Conwell.
"Conflict Adaptation in Vision-Language Models". Xiaoyang Hu.
"Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques". Lang Xiong, Raina Gao, Alyssa Jeong, Yicheng Fu, Kevin Zhu, Sean O'Brien, Vasu Sharma.
"Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory". Ming Li, Nan Zhang, Chenrui Fan, Hong Jiao, Tianyi Zhou.
"Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis". Amartya Hatua.
"Strategy and structure in Codenames: Comparing human and GPT-4 gameplay". Noah Prescott, Tracey Mills, Jonathan Phillips.
"Language models can associate objects with their features without forming integrated representations". Simon Jerome Han, James Lloyd McClelland.
"Unifying Gestalt Principles Through Inference-Time Prior Integration". Tahereh Toosi, Kenneth D. Miller.
"Assessing Behavioral Effects of Reasoning (or the lack of) in LLMs". ARTHUR BUZELIN, Samira Malaquias, Victoria Estanislau, Yan Aquino, Pedro Augusto Torres Bento, Lucas Dayrell, Arthur Chagas, Gisele L. Pappa, Wagner Meira Jr..
"Actual or counterfactual? Asymmetric responsibility attributions in language models". Eric Bigelow, Yang Xiang, Tobias Gerstenberg, Tomer Ullman, Samuel J. Gershman.
"Theoretical Linguistics Constrains Hypothesis-Driven Causal Abstraction in Mechanistic Interpretability". Suchir Salhan, Konstantinos Voudouris.
"Language Models use Lookbacks to Track Beliefs". Nikhil Prakash, Natalie Shapira, Arnab Sen Sharma, Christoph Riedl, Yonatan Belinkov, Tamar Rott Shaham, David Bau, Atticus Geiger.
"Scratchpad Thinking: Alternation Between Storage and Computation in Latent Reasoning Models". Sayam Goyal, Brad Peters, María Emilia Granda, Akshath Vijayakumar Narmadha, Dharunish Yugeswardeenoo, Callum Stuart McDougall, Sean O'Brien, Ashwinee Panda, Kevin Zhu, Cole Blondin.
"STAT: Skill-Targeted Adaptive Training". Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora.
"Priors in Time: A Generative View of Sparse Autoencoders for Sequential Representations". Ekdeep Singh Lubana, Sai Sumedh R. Hindupur, Can Rager, Valérie Costa, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Greta Tuckute, Daniel Wurgaft, Demba E. Ba, Melanie Weber, Aaron Mueller.
"Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows". Billy Dickson, Zoran Tiganj.
"Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning". Anjishnu Kumar.
"Visual symbolic mechanisms: Emergent symbol processing in vision language models". Rim Assouel, Declan Iain Campbell, Taylor Whittington Webb.
"Let's Think 一步一步: A Cognitive Framework for Characterizing Code-Switching in LLM Reasoning". Eleanor Lin, David Jurgens.
"Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction". James A. Michaelov, Catherine Arnett.
"Video Finetuning Improves Reasoning Between Frames". Ruiqi Yang, Tian Yun, Zihan Wang, Ellie Pavlick.
"Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning". Caroline Baumgartner, Eleanor Spens, Neil Burgess, Petru Manescu.
"A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy". Claire O'Brien, Jessica Seto, Dristi Roy, Aditya Dwivedi, Ryan Lagasse, Sunishchal Dev, Kevin Zhu, Sean O'Brien.
"Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm". Amrapali Pednekar, Álvaro Garrido Pérez, Yara Khaluf, Pieter Simoens.
"Neural Correlates of Language Models Are Specific to Human Language". Iñigo Parra.
"Pedagogical Alignment of LLMs requires Diverse Cognitively-Inspired Student Proxies". Suchir Salhan, Andrew Caines, Paula Buttery.
"I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs". Puria Radmard, Shi Feng.
"Personality Manipulation as a Cognitive Probe in Large Language Models". Gunmay Handa, Zekun Wu, Adriano Koshiyama, Philip Colin Treleaven.
"Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games". Maya Okawa, Ekdeep Singh Lubana, Mai Uchida, Hidenori Tanaka.
"Metacognitive Sensitivity for Test-Time Dynamic Model Selection". Le Tuan Minh Trinh, Le Minh Vu Pham, Thi Minh Anh Pham, An Duc Nguyen.
"Causality $\neq$ Decodability, and Vice Versa: Lessons from Interpreting Counting ViTs". Lianghuan Huang, Yingshan Chang.
"Context informs pragmatic interpretation in vision–language models". Alvin Wei Ming Tan, Ben Prystawski, Veronica Boyce, Michael Frank.
"Unraveling the cognitive patterns of Large Language Models through module communities". Kushal Raj Bhandari, Pin-Yu Chen, Jianxi Gao.
"DecepBench: Benchmarking Multimodal Deception Detection". Vittesh Maganti, Nysa Lalye, Ethan Braverman, Kevin Zhu, Vasu Sharma, Sean O'Brien.
"Misalignment Between Vision-Language Representations in Vision-Language Models". Yonatan Gideoni, Yoav Gelberg, Tim G. J. Rudner, Yarin Gal.
"Detecting Motivated Reasoning in the Internal Representations of Language Models". Parsa Mirtaheri, Mikhail Belkin.
"PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm". Jing-Jing Li, Joel Mire, Eve Fleisig, Valentina Pyatkin, Maarten Sap, Sydney Levine.
"LLM Agents Beyond Utility: An Open-Ended Perspective". Asen Nachkov, Xi Wang, Luc Van Gool.
"Perceived vs. True Emergence: A Cognitive Account of Generalization in Clinical Time Series Models". Shashank Yadav.
"A Neuroscience-Inspired Dual-Process Model of Compositional Generalization". Alexander Noviello, Claas Beger, Jacob Groner, Kevin Ellis, Weinan Sun.
"From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment". Ella Lan, Andrea Yu, Sergio Charles.
"Measuring LLM Generation Spaces with EigenScore". Sunny Yu, Myra Cheng, Ahmad Jabbar, Robert D. Hawkins, Dan Jurafsky.
"Fuzzy, Symbolic, and Contextual: Enhancing LLM Instruction via Cognitive Scaffolding". Vanessa Figueiredo.
"Causal Interventions on Continuous Features in LLMs: A Case Study in Verb Bias". Zhenghao Zhou, R. Thomas McCoy, Robert Frank.
"Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behaviour". Eric Zhang, Daniel Aarao Reis Arturi, Andrew Adrian Ansah, Kevin Zhu, Ashwinee Panda, Aishwarya Balwani.
"Reverse-Engineering Memory in DreamerV3: From Sparse Representations to Functional Circuits". Jan Sobotka, Auke Ijspeert, Guillaume Bellegarda.
"Does FLUX Know What It’s Writing?". Adrian Chang, Sheridan Feucht, Byron C Wallace, David Bau.
"Discovering Functionally Sufficient Projections with Functional Component Analysis". Satchel Grant.
"Towards Visual Simulation in Multimodal Language Models". Catherine Finegan-Dollak.
"RNNs reveal a new optimal stopping rule in sequential sampling for decision-making". Jialin Li, Kenway Louie, Paul W. Glimcher, Bo Shen.
"Minimization of Boolean Complexity in In-Context Concept Learning". Leroy Z. Wang, R. Thomas McCoy, Shane Steinert-Threlkeld.
"Demystifying Emergent Exploration in Goal-conditioned RL". Mahsa Bastankhah, Grace Liu, Dilip Arumugam, Thomas L. Griffiths, Benjamin Eysenbach.
"Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment". María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Giovanni Franco Gabriel Marraffini, Mario Leiva, Gerardo Simari, Maria Vanina Martinez.
"How Do LLMs Ask Questions? A Pragmatic Comparison with Human Question-Asking". Chani Jung, Jimin Mun, Xuhui Zhou, Alice Oh, Maarten Sap, Hyunwoo Kim.
"Predicting the Formation of Induction Heads". Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider.
"How Intrinsic Motivation Shapes Learned Representations in Decision Transformers: A Cognitive Interpretability Analysis". Leonardo Guiducci, Antonio Rizzo, Giovanna Maria Dimitri.
"Decoding and Reconstructing Visual Experience from Brain Activity with Generative Latent Representations". Motokazu Umehara, Yoshihiro Nagano, Misato Tanaka, Yukiyasu Kamitani.
"From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories". Minghao Fu, Sheng Zhang, Guangyi Chen, Zijian Li, Fan Feng, Yifan Shen, Shaoan Xie, Kun Zhang.
"Disentangling Interpretable Cognitive Variables That Support Human Generalization". Xinyue Zhu, Daniel L. Kimmel.
"Interpreting style–content parsing in vision–language models". Fan L. Cheng, Xin Jing.
"Acoustic Degradation Reweights Cortical and ASR Processing: A Brain-Model Alignment Study". Francis Pingfan Chien, Chia-Chun Dan Hsu, Po-Jang Hsieh, Yu Tsao.
"Towards Cognitively Plausible Concept Learning: Spatially Grounding Concepts with Anatomical Priors". Yuyu Zhou.
"(How) Do LLMs Plan in One Forward Pass?". Michael Hanna, Emmanuel Ameisen.
"Value Entanglement: Conflation Between Moral and Grammatical Good In (Some) Large Language Models". Seong Hah Cho, Junyi Li, Anna Leshinskaya.
"Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!". Subbarao Kambhampati, Kaya Stechly, Karthik Valmeekam, Lucas Paul Saldyt, Siddhant Bhambri, Vardhan Palod, Atharva Gundawar, Soumya Rani Samineni, Durgesh Kalwar, Upasana Biswas.
"Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?". Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati.
"What is a Number, That a Large Language Model May Know It?". Raja Marjieh, Veniamin Veselovsky, Thomas L. Griffiths, Ilia Sucholutsky.
"NiceWebRL: a Python library for human subject experiments with reinforcement learning environments". Wilka Carvalho, Vikram Srinivas Goddla, Ishaan Sinha, Hoon Shin, Kunal Jha.
"The One Where They Brain-Tune for Social Cognition: Multi-Modal Brain-Tuning on Friends". Nico Policzer, Cameron Braunstein, Mariya Toneva.
"Signatures of human-like processing in Transformer forward passes". Jennifer Hu, Michael A. Lepori, Michael Franke.
"CurLL: Curriculum Learning of Language Models". Pavan Kalyan Tankala, Shubhra Mishra, Satya Lokam, Navin Goyal.
"Learning to Look: Cognitive Attention Alignment with Vision-Language Models". Ryan L. Yang, Dipkamal Bhusal, Nidhi Rastogi.
"Deconstructing the Reasoning Process of a Neuro-Fuzzy Agent: From Learned Concepts to Natural Language Narratives". Yumin Zhou, Whye Loon Tung, Hiok Quek.
"A Cognitive Architecture for Probing Hierarchical Processing and Predictive Coding in Deep Vision Models". Brennen Hill, Zhang Xinyu, Timothy Putra Prasetio.
"Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence". Sanish Suwal, Dipkamal Bhusal, Michael Clifford, Nidhi Rastogi.
"Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation". Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati.
"MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning". Jin Wu, Chanjin Zheng.
"Interpretable Hybrid Neural-Cognitive Models Discover Cognitive Strategies Underlying Flexible Reversal Learning". Chonghao Cai, Liyuan Li, Yifei Cao, Maria K Eckstein.
"Mechanisms of Symbol Processing in Transformers". Paul Smolensky, Roland Fernandez, Zhenghao Zhou, Mattia Opper, Adam Davies, Jianfeng Gao.
"Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression". Nathaniel Imel, Noga Zaslavsky.
"Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs". Sonia Krishna Murthy, Rosie Zhao, Jennifer Hu, Sham M. Kakade, Markus Wulfmeier, Peng Qian, Tomer Ullman.
Posters
"Cognitive Behavior Modeling via Activation Steering". Anthony Kuang, Ahmed Ismail, Ayo Akinkugbe, Kevin Zhu, Sean O'Brien.
"Cognitive Load Traces as Symbolic and Visual Accounts of Deep Model Cognition". Dong Liu, Yanxuan Yu.
"Don’t Think of the White Bear: Ironic Negation in Transformer Models under Cognitive Load". Logan Mann, Nayan Saxena, Sarah Tandon, Chenhao Sun, Savar Toteja, Kevin Zhu.
"A Control-Theoretic Account of Cognitive Effort in Language Models". Pranjal Garg.
"On the Role of Pretraining in Domain Adaptation in an Infant-Inspired Distribution Shift Task". Deepayan Sanyal, Joel Phillips Michelson, Maithilee Kunda.
"When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?". Xiaoyun Yin, Elmira Zahmat Doost, Shiwen Zhou, Garima Arya Yadav, Jamie Gorman.
"The Mechanistic Emergence of Symbol Grounding in Language Models". Ziqiao Ma, Shuyu Wu, Xiaoxi Luo, Yidong Huang, Josue Torres-Fonseca, Freda Shi, Joyce Chai.
"Sparse Feature Coactivation Reveals Composable Semantic Modules in Large Language Models". Ruixuan Deng, Xiaoyang Hu, Miles Gilberti, Shane Storks, Aman Taxali, Mike Angstadt, Chandra Sripada, Joyce Chai.
"Tracing the Development of Syntax and Semantics in a Model trained on Child-Directed Speech and Visual Input". Nina Schoener, Mahesh Srinivasan, Colin Conwell.
"Conflict Adaptation in Vision-Language Models". Xiaoyang Hu.
"Sarc7: Evaluating Sarcasm Detection and Generation with Seven Types and Emotion-Informed Techniques". Lang Xiong, Raina Gao, Alyssa Jeong, Yicheng Fu, Kevin Zhu, Sean O'Brien, Vasu Sharma.
"Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory". Ming Li, Nan Zhang, Chenrui Fan, Hong Jiao, Tianyi Zhou.
"Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis". Amartya Hatua.
"Strategy and structure in Codenames: Comparing human and GPT-4 gameplay". Noah Prescott, Tracey Mills, Jonathan Phillips.
"Language models can associate objects with their features without forming integrated representations". Simon Jerome Han, James Lloyd McClelland.
"Unifying Gestalt Principles Through Inference-Time Prior Integration". Tahereh Toosi, Kenneth D. Miller.
"Assessing Behavioral Effects of Reasoning (or the lack of) in LLMs". ARTHUR BUZELIN, Samira Malaquias, Victoria Estanislau, Yan Aquino, Pedro Augusto Torres Bento, Lucas Dayrell, Arthur Chagas, Gisele L. Pappa, Wagner Meira Jr..
"Actual or counterfactual? Asymmetric responsibility attributions in language models". Eric Bigelow, Yang Xiang, Tobias Gerstenberg, Tomer Ullman, Samuel J. Gershman.
"Theoretical Linguistics Constrains Hypothesis-Driven Causal Abstraction in Mechanistic Interpretability". Suchir Salhan, Konstantinos Voudouris.
"Language Models use Lookbacks to Track Beliefs". Nikhil Prakash, Natalie Shapira, Arnab Sen Sharma, Christoph Riedl, Yonatan Belinkov, Tamar Rott Shaham, David Bau, Atticus Geiger.
"Scratchpad Thinking: Alternation Between Storage and Computation in Latent Reasoning Models". Sayam Goyal, Brad Peters, María Emilia Granda, Akshath Vijayakumar Narmadha, Dharunish Yugeswardeenoo, Callum Stuart McDougall, Sean O'Brien, Ashwinee Panda, Kevin Zhu, Cole Blondin.
"STAT: Skill-Targeted Adaptive Training". Yinghui He, Abhishek Panigrahi, Yong Lin, Sanjeev Arora.
"Priors in Time: A Generative View of Sparse Autoencoders for Sequential Representations". Ekdeep Singh Lubana, Sai Sumedh R. Hindupur, Can Rager, Valérie Costa, Oam Patel, Sonia Krishna Murthy, Thomas Fel, Greta Tuckute, Daniel Wurgaft, Demba E. Ba, Melanie Weber, Aaron Mueller.
"Gradual Forgetting: Logarithmic Compression for Extending Transformer Context Windows". Billy Dickson, Zoran Tiganj.
"Are Humans Evolved Instruction Followers? An Underlying Inductive Bias Enables Rapid Instructed Task Learning". Anjishnu Kumar.
"Visual symbolic mechanisms: Emergent symbol processing in vision language models". Rim Assouel, Declan Iain Campbell, Taylor Whittington Webb.
"Let's Think 一步一步: A Cognitive Framework for Characterizing Code-Switching in LLM Reasoning". Eleanor Lin, David Jurgens.
"Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction". James A. Michaelov, Catherine Arnett.
"Video Finetuning Improves Reasoning Between Frames". Ruiqi Yang, Tian Yun, Zihan Wang, Ellie Pavlick.
"Cognitive Maps in Language Models: A Mechanistic Analysis of Spatial Planning". Caroline Baumgartner, Eleanor Spens, Neil Burgess, Petru Manescu.
"A Few Bad Neurons: Isolating and Surgically Correcting Sycophancy". Claire O'Brien, Jessica Seto, Dristi Roy, Aditya Dwivedi, Ryan Lagasse, Sunishchal Dev, Kevin Zhu, Sean O'Brien.
"Modulation of temporal decision-making in a deep reinforcement learning agent under the dual-task paradigm". Amrapali Pednekar, Álvaro Garrido Pérez, Yara Khaluf, Pieter Simoens.
"Neural Correlates of Language Models Are Specific to Human Language". Iñigo Parra.
"Pedagogical Alignment of LLMs requires Diverse Cognitively-Inspired Student Proxies". Suchir Salhan, Andrew Caines, Paula Buttery.
"I Am Large, I Contain Multitudes: Persona Transmission via Contextual Inference in LLMs". Puria Radmard, Shi Feng.
"Personality Manipulation as a Cognitive Probe in Large Language Models". Gunmay Handa, Zekun Wu, Adriano Koshiyama, Philip Colin Treleaven.
"Kindness or Sycophancy? Understanding and Shaping Model Personality via Synthetic Games". Maya Okawa, Ekdeep Singh Lubana, Mai Uchida, Hidenori Tanaka.
"Metacognitive Sensitivity for Test-Time Dynamic Model Selection". Le Tuan Minh Trinh, Le Minh Vu Pham, Thi Minh Anh Pham, An Duc Nguyen.
"Causality $\neq$ Decodability, and Vice Versa: Lessons from Interpreting Counting ViTs". Lianghuan Huang, Yingshan Chang.
"Context informs pragmatic interpretation in vision–language models". Alvin Wei Ming Tan, Ben Prystawski, Veronica Boyce, Michael Frank.
"Unraveling the cognitive patterns of Large Language Models through module communities". Kushal Raj Bhandari, Pin-Yu Chen, Jianxi Gao.
"DecepBench: Benchmarking Multimodal Deception Detection". Vittesh Maganti, Nysa Lalye, Ethan Braverman, Kevin Zhu, Vasu Sharma, Sean O'Brien.
"Misalignment Between Vision-Language Representations in Vision-Language Models". Yonatan Gideoni, Yoav Gelberg, Tim G. J. Rudner, Yarin Gal.
"Detecting Motivated Reasoning in the Internal Representations of Language Models". Parsa Mirtaheri, Mikhail Belkin.
"PluriHarms: Benchmarking the Full Spectrum of Human Judgments on AI Harm". Jing-Jing Li, Joel Mire, Eve Fleisig, Valentina Pyatkin, Maarten Sap, Sydney Levine.
"LLM Agents Beyond Utility: An Open-Ended Perspective". Asen Nachkov, Xi Wang, Luc Van Gool.
"Perceived vs. True Emergence: A Cognitive Account of Generalization in Clinical Time Series Models". Shashank Yadav.
"A Neuroscience-Inspired Dual-Process Model of Compositional Generalization". Alexander Noviello, Claas Beger, Jacob Groner, Kevin Ellis, Weinan Sun.
"From Black Box to Bedside: Distilling Reinforcement Learning for Interpretable Sepsis Treatment". Ella Lan, Andrea Yu, Sergio Charles.
"Measuring LLM Generation Spaces with EigenScore". Sunny Yu, Myra Cheng, Ahmad Jabbar, Robert D. Hawkins, Dan Jurafsky.
"Fuzzy, Symbolic, and Contextual: Enhancing LLM Instruction via Cognitive Scaffolding". Vanessa Figueiredo.
"Causal Interventions on Continuous Features in LLMs: A Case Study in Verb Bias". Zhenghao Zhou, R. Thomas McCoy, Robert Frank.
"Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behaviour". Eric Zhang, Daniel Aarao Reis Arturi, Andrew Adrian Ansah, Kevin Zhu, Ashwinee Panda, Aishwarya Balwani.
"Reverse-Engineering Memory in DreamerV3: From Sparse Representations to Functional Circuits". Jan Sobotka, Auke Ijspeert, Guillaume Bellegarda.
"Does FLUX Know What It’s Writing?". Adrian Chang, Sheridan Feucht, Byron C Wallace, David Bau.
"Discovering Functionally Sufficient Projections with Functional Component Analysis". Satchel Grant.
"Towards Visual Simulation in Multimodal Language Models". Catherine Finegan-Dollak.
"RNNs reveal a new optimal stopping rule in sequential sampling for decision-making". Jialin Li, Kenway Louie, Paul W. Glimcher, Bo Shen.
"Minimization of Boolean Complexity in In-Context Concept Learning". Leroy Z. Wang, R. Thomas McCoy, Shane Steinert-Threlkeld.
"Demystifying Emergent Exploration in Goal-conditioned RL". Mahsa Bastankhah, Grace Liu, Dilip Arumugam, Thomas L. Griffiths, Benjamin Eysenbach.
"Do Large Language Models Show Biases in Causal Learning? Insights from Contingency Judgment". María Victoria Carro, Denise Alejandra Mester, Francisca Gauna Selasco, Giovanni Franco Gabriel Marraffini, Mario Leiva, Gerardo Simari, Maria Vanina Martinez.
"How Do LLMs Ask Questions? A Pragmatic Comparison with Human Question-Asking". Chani Jung, Jimin Mun, Xuhui Zhou, Alice Oh, Maarten Sap, Hyunwoo Kim.
"Predicting the Formation of Induction Heads". Tatsuya Aoyama, Ethan Wilcox, Nathan Schneider.
"How Intrinsic Motivation Shapes Learned Representations in Decision Transformers: A Cognitive Interpretability Analysis". Leonardo Guiducci, Antonio Rizzo, Giovanna Maria Dimitri.
"Decoding and Reconstructing Visual Experience from Brain Activity with Generative Latent Representations". Motokazu Umehara, Yoshihiro Nagano, Misato Tanaka, Yukiyasu Kamitani.
"From Comparison to Composition: Towards Understanding Machine Cognition of Unseen Categories". Minghao Fu, Sheng Zhang, Guangyi Chen, Zijian Li, Fan Feng, Yifan Shen, Shaoan Xie, Kun Zhang.
"Disentangling Interpretable Cognitive Variables That Support Human Generalization". Xinyue Zhu, Daniel L. Kimmel.
"Interpreting style–content parsing in vision–language models". Fan L. Cheng, Xin Jing.
"Acoustic Degradation Reweights Cortical and ASR Processing: A Brain-Model Alignment Study". Francis Pingfan Chien, Chia-Chun Dan Hsu, Po-Jang Hsieh, Yu Tsao.
"Towards Cognitively Plausible Concept Learning: Spatially Grounding Concepts with Anatomical Priors". Yuyu Zhou.
"(How) Do LLMs Plan in One Forward Pass?". Michael Hanna, Emmanuel Ameisen.
"Value Entanglement: Conflation Between Moral and Grammatical Good In (Some) Large Language Models". Seong Hah Cho, Junyi Li, Anna Leshinskaya.
"Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!". Subbarao Kambhampati, Kaya Stechly, Karthik Valmeekam, Lucas Paul Saldyt, Siddhant Bhambri, Vardhan Palod, Atharva Gundawar, Soumya Rani Samineni, Durgesh Kalwar, Upasana Biswas.
"Do Cognitively Interpretable Reasoning Traces Improve LLM Performance?". Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati.
"What is a Number, That a Large Language Model May Know It?". Raja Marjieh, Veniamin Veselovsky, Thomas L. Griffiths, Ilia Sucholutsky.
"NiceWebRL: a Python library for human subject experiments with reinforcement learning environments". Wilka Carvalho, Vikram Srinivas Goddla, Ishaan Sinha, Hoon Shin, Kunal Jha.
"The One Where They Brain-Tune for Social Cognition: Multi-Modal Brain-Tuning on Friends". Nico Policzer, Cameron Braunstein, Mariya Toneva.
"Signatures of human-like processing in Transformer forward passes". Jennifer Hu, Michael A. Lepori, Michael Franke.
"CurLL: Curriculum Learning of Language Models". Pavan Kalyan Tankala, Shubhra Mishra, Satya Lokam, Navin Goyal.
"Learning to Look: Cognitive Attention Alignment with Vision-Language Models". Ryan L. Yang, Dipkamal Bhusal, Nidhi Rastogi.
"Deconstructing the Reasoning Process of a Neuro-Fuzzy Agent: From Learned Concepts to Natural Language Narratives". Yumin Zhou, Whye Loon Tung, Hiok Quek.
"A Cognitive Architecture for Probing Hierarchical Processing and Predictive Coding in Deep Vision Models". Brennen Hill, Zhang Xinyu, Timothy Putra Prasetio.
"Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence". Sanish Suwal, Dipkamal Bhusal, Michael Clifford, Nidhi Rastogi.
"Interpretable Traces, Unexpected Outcomes: Investigating the Disconnect in Trace-Based Knowledge Distillation". Siddhant Bhambri, Upasana Biswas, Subbarao Kambhampati.
"MetaCD: A Meta Learning Framework for Cognitive Diagnosis based on Continual Learning". Jin Wu, Chanjin Zheng.
