【2026-06-05】Prof. Kate Ching-Ju Lin, National Taiwan University " Worker Placement and Partial Aggregation for Distributed Training "

  • 2026-05-28
  • 白師瑜
Title: Worker Placement and Partial Aggregation for Distributed Training
Date: 2026/06/05 14:20-15:30
Location: R103, CSIE
Speaker: Prof. Kate Ching-Ju Lin, National Taiwan University
Host: Mike Y. Chen


Abstract:
Distributed Deep Learning (DDL) has become a core technique for training increasingly large and complex models. While parallelizing local training across workers improves scalability, DDL efficiency is often limited by stragglers. While prior work assumes full worker participation or a fixed deployment, we argue that this problem is more effectively mitigated by strategically placing workers. In addition, recent studies have proposed partial-reduce schemes that fully or partially exclude slow workers. This, however, wastes their capacity and misses opportunities to improve model accuracy. In this talk, I will introduce a method for jointly optimizing the number of workers, their physical placement, and the communication overlay to minimize overall training time. To fully utilize worker capability, we further present a proportional reduce communication framework that allows each worker to contribute gradient updates in proportion to its computing and communication resources. We show that optimizing worker placement efficiently reduces the overall training time. By further enabling proportional reduce, heterogeneous resources can be more better utilized to reduce gradient reconstruction error by up to 66.5% without harming model convergence.

Bio:
Kate Ching-Ju Lin is a Professor in the Department of Computer Science and Information Engineering at National Taiwan University, Taipei, Taiwan. Her current research interests include networking for AI, distributed computing in datacenter networks, and wireless satellite networks. Dr. Lin is an associate editor for IEEE/ACM Transactions on Networking. She has served as the TPC co-chairs for IEEE ICNP 2023 and ACM Mobicom 2025. She has also served as a PC member in many international conferences, including ACM SIGCOMM, ACM MOBICOM, ACM MOBISYS, USENIX NSDI, IEEE INFOCOM, IEEE ICNP, IEEE GLOBECOM, and IEEE ICC. She is a recipient of the Academic Research Award from MOST in 2022 and of the Columbus Program from MOST from 2020 to 2025.