Security and Privacy of Machine Learning, Fall 2025

Wednesdays 9:10am - 12:10pm, CSIE Building, R105

You can sign up for this course at NTU COOL (instruction here). We will send out the permission codes after the first class.

Instructor: Shang-Tse Chen
  • Email: stchen at csie.ntu.edu.tw
  • Office hour: after classes, or by appointment
TA: Tung-Jun Lin
  • Email: r13922033 at ntu.edu.tw
  • Office hour: TBD
Modern machine learning models has reached and even surpassed human performance in many areas. However, many of the successful cases only hold in clean and controlled settings, which can be far from real scenarios. This course will introduce you to potential vulnerabilities of ML models. We will design and implement various attacks during model training and testing phases, as well as methods to make ML models more robust. We will also cover other important aspects of ML, including privacy and fairness.

Course Schedule Evolving

We will use NTU COOL for slides, homework assignments, announcement, and discussion.

Date Topics Reading Note
9/3 Course Introduction
9/10 Adversarial Attacks & Defenses
* Towards Deep Learning Models Resistant to Adversarial Attacks
* Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples
* Annealing Self-Distillation Rectification Improves Adversarial Training
* Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
9/17 Theoretical Analysis and Certified Defenses to Adversarial Attacks
* Adversarial Examples Are Not Bugs, They Are Features
* Certified Adversarial Robustness via Randomized Smoothing
* Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss
9/24 Student Presentations
G1: Poisoning Attacks
* Shadowcast: Stealthy Data Poisoning Attacks against Vision-Language Models
* MP-Nav: Enhancing Data Poisoning Attacks against Multimodal Learning
* PoisonBench: Assessing Language Model Vulnerability to Poisoned Preference Data
G2: Backdoor Attacks
* Data Free Backdoor Attacks
* Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor
* A Closer Look at Backdoor Attacks on CLIP
10/1 Jailbreaking LLMs
* Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
* Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
* Deliberative Alignment: Reasoning Enables Safer Language Models
10/8 Student Presentations
G3: Adversarial Attack on LLMs
* DA^3: A Distribution-Aware Adversarial Attack against Language Models
* An LLM can Fool Itself: A Prompt-Based Adversarial Attack
* Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment
G4: Jailbreaking VLMs
* Jailbreak Large Vision-Language Models Through Multi-Modal Linkage
* IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
* HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States
10/15 Student Presentations
G5: Prompt Injection
* EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage
* Defense Against Prompt Injection Attack by Leveraging Attack Techniques
* MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents
G6: Security and Privacy Risks in RAG
* On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains
* Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems
* SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model
10/22 Final project proposal presentation
Hallucination
* Why Language Models Hallucinate
* Learning to Reason for Hallucination Span Detection
Final project proposal due
10/29 Model & Data Privacy
* Stealing Part of a Production Language Model
* Trap-MID: Trapdoor-based Defense against Model Inversion Attacks
* Generative Model Inversion Through the Lens of the Manifold Hypothesis
11/5 Student Presentations
G7: Machine Unlearning
* Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models
* The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models
* SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
G8: Model Immunization
* IMMA: Immunizing text-to-image Models against Malicious Adaptation
* Multi-concept Model Immunization through Differentiable Model Merging
* Model Immunization from a Condition Number Perspective
11/12 Fairness
* FairNet: Dynamic Fairness Correction without Performance Loss via Contrastive Conditional LoRA
* Guiding LLM Decision-Making with Fairness Reward Models
* On Fairness of Unified Multimodal Large Language Model for Image Generation
11/19 Student Presentations
G9: Membership Inference Attack
* Membership Inference Attacks against Large Vision-Language Models
* Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models
* Variance-Based Membership Inference Attacks Against Large-Scale Image Captioning Models
G10: Security and Privacy in Federated Learning
* Model Poisoning Attacks to Federated Learning via Multi-Round Consistency
* Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
* From Risk to Resilience: Towards Assessing and Mitigating the Risk of Data Reconstruction Attacks in Federated Learning
11/26 Student Presentations
G11: LLM Memorization
* Rethinking LLM Memorization through the Lens of Adversarial Compression
* Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data
* Memorization Sinks: Isolating Memorization during LLM Training
G12: Security and Privacy of Multi-Agent Systems
* Secret Collusion among AI Agents: Multi-Agent Deception via Steganography
* Single-agent Poisoning Attacks Suffice to Ruin Multi-Agent Learning
* Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
12/3 Guest Lecture
12/10 Final project presentation
12/17 Final project presentation
12/24 Winter vacation starts!
Final project report due

Reading Critique

  • Choose a paper from the suggested reading list and write a paper critique of at most two pages.
  • The critique should address:
    • Summary of the paper
    • The strength of the paper
    • The weakness of the paper
    • Potential improvements or extensions
    • Questions
  • Each critique is worth 2 points.
  • For weeks with two student presentation groups, you can submit one critique from each topic.
  • You can submit more than 10 critiques, and we will use the highest 10 scores.

Paper Presentation

  • A group of students (size TBD based on class size) will present and lead the discussion on an extended topic related to this course.
  • The presentation including QA should be within 75 minutes.
  • The presenter should answer live questions and comments under the recorded video on NTU COOL.

Class Participation

  • You get 1 point for each question asked to the student presenter in class.

Final Project

  • You will work in groups on a topic related to this course.
  • Example project format:
    • Tackle an open problem (not necessarily need to be successful)
    • Improve algorithms in a paper with techniques that you come up with
    • Apply the techniques you learned to novel applications
    • Benchmark algorithms from multiple papers and get some insights
    • Literature survey of some related areas that we did not cover
  • You need do a short presentation of your proposal in class on 10/22.
  • Presentation should be similar to a conference talk (20 minutes presentation + 5 minites QA).
  • Final report should be typed with Latex (with ICML format) and no more than 6 pages.

Grading Policy

  • Reading Critique: 20%
  • Paper presentation: 20%
  • Class Participation: 10%
  • Project: 50%
    • Proposal (10%)
    • Presentation (20%)
    • Final report (20%)
  • All due times are at 11:59 pm the day before class.
    • No late submission is accepted.
    • Exception: you email Shang-Tse and get the approval before the deadline.