【2026-03-27】羅紹元教授 / 台大資工 / From Shortcuts to Reasoning: Robust Post-Training of Theory of Mind

  • 2026-03-09
  • 黃雅群(職務代理)
Title
擺脫學習偷吃步:打造穩健的語言模型心智推理後訓練方法
From Shortcuts to Reasoning: Robust Post-Training of Theory of Mind
Date2026/3/27 15:40-17:00
LocationR102, CSIE
Speakers:羅紹元教授
Host:鄭龍磻教授

Abstract:
Theory of Mind (ToM) is crucial for foundation models to operate safely in the real world. While recent work explores improving ToM via post-training, we show that progress is often inflated by pervasive “shortcuts,” where models achieve up to 99% accuracy by exploiting spurious cues rather than genuine reasoning. We introduce a framework to systematically diagnose shortcut issues in ToM datasets and find that tasks reducible to state tracking (e.g., beliefs) are especially prone to shortcuts, whereas mind-level questions (e.g., intentions) better require true reasoning. Using four shortcut-free datasets across three ToM contexts, we evaluate whether reinforcement-learning fine-tuning with explicit reasoning (Thinking-RFT) surpasses supervised fine-tuning (SFT). We find that Thinking-RFT improves ToM across all settings, generalizes better to unseen domains and higher-order queries, and is more robust to counterfactuals. Additional analyses show that ToM gains arise specifically from the combination of reasoning and RL, which helps models ground their reasoning on causal anchor cues.

Biography:
Shao-Yuan Lo is a newly appointed Assistant Professor at National Taiwan University (NTU). Prior to joining NTU, he was a Research Scientist at Honda Research Institute USA. He received his Ph.D. from Johns Hopkins University in 2023 and his M.S. and B.S. degrees from National Chiao Tung University in 2019 and 2017, respectively. His recent research focuses on Multimodal LLMs and Trustworthy AI. He has first- or corresponding-authored nearly 20 publications in venues such as IEEE T-PAMI, IEEE T-IP, IJCV, ICML (Spotlight), CVPR (Highlight), and ECCV. He won the Outstanding Reviewer at CVPR 2025 and the Best Paper Award at ACM Multimedia Asia 2019.