【2025-12-19】Prof. Lily Weng / Towards Trustworthy AI: Automated Interpretability, Adversarial Robustness, and AI safety

  • 2025-11-11
  • 黃雅群(職務代理)
TitleTowards Trustworthy AI: Automated Interpretability, Adversarial Robustness, and AI safety
Date2025/12/19 10:20
LocationR107, CSIE
SpeakersProf. Lily Weng
Host:林軒田教授


Abstract:
Deep learning models have become remarkably powerful – but often operate as black boxes. In this talk, I will share how my lab is making these systems more transparent, reliable, and trustworthy. I’ll highlight three research directions to bring interpretability into deep learning: (1) automated tools [1-4] that reveal what neural networks learn internally at scale; (2) inherently interpretable neural model architectures [5-8] that make model’s decision process more understandable and controllable; and (3) evaluation frameworks [9-12] that quantify interpretability and enable trust. I’ll also touch on our recent work [13-16] in jailbreak attacks on LLMs, robustness verification, and robust learning for safer AI deployment. Together, these efforts aim to move modern AI beyond accuracy – toward systems we can truly understand, align, and trust. For more details, please visit https://lilywenglab.github.io/.

Biography:
Lily Weng is an Assistant Professor in the Halıcıoğlu Data Science Institute at UC San Diego with affiliation in the CSE department. She received her PhD in Electrical Engineering and Computer Science (EECS) from MIT in August 2020, and her Bachelor and Master degree both in Electrical Engineering at National Taiwan University. Prior to UCSD, she spent 1 year in MIT-IBM Watson AI Lab and several research internships in Google DeepMind, IBM Research and Mitsubishi Electric Research Lab. Her research interest is in machine learning and deep learning, with primary focus on Trustworthy AI. Her vision is to make the next generation AI systems and deep learning algorithms more
robust, reliable, explainable, trustworthy and safer. Her work has been recognized and supported by several NSF awards, ARL award, Intel Rising Star Faculty Award, Hellman Fellowship, and Nvidia Academic award. For more details, please see https://lilywenglab.github.io/.