【2025-10-29】 Prof. Jun-Cheng Chen 陳駿丞 / CITI / From Noise to Desired Visuals: Guided Diffusion for Image and Video Applications

  • 2025-10-13
  • 黃雅群(職務代理)
TitleFrom Noise to Desired Visuals: Guided Diffusion for Image and Video Applications
Date2025/10/29 15:30-17:20
LocationR103, CSIE
SpeakersProf. Jun-Cheng Chen 陳駿丞
Host:羅紹元教授


Abstract:
In this talk, I will present our research on guided diffusion for image and video applications, including video-to-video translation, QR code generation, and text-to-image generation with photographic effects. First, I will introduce MeDM, an innovative framework that leverages pre-trained image diffusion models for video rendering with smooth temporal flow. By integrating explicit optical flow into the diffusion process, MeDM imposes physical constraints that enhance temporal consistency, all without requiring model fine-tuning or test-time optimization. Extensive experiments confirm its superior performance across qualitative, quantitative, and user studies. Next, I will present DiffQRCoder, a novel diffusion-based approach for creating scannable yet aesthetically appealing QR codes. DiffQRCoder incorporates scanning-robust perceptual guidance (SRPG) to preserve decoding fidelity while improving visual quality, and employs a post-processing technique, SR-MPGD, to further enhance scanning robustness. Our results demonstrate significant improvements in scanning success rates and user-perceived attractiveness, highlighting its promise for practical applications.Finally, I will introduce our camera token as settings framework, which enables intuitive control of diffusion-based image generation using numerical camera parameters such as aperture, shutter speed, and focal length. This framework bridges generative models and real-world photographic effects, providing users with a familiar and precise interface for guiding generation.

Biography:
Jun-Cheng Chen is now an associate research fellow at the Research Center for Information Technology Innovation (CITI), Academia Sinica. He joined CITI as an assistant research fellow in 2019. He received the B.S. and M.S. degrees advised by Prof. Ja-Ling Wu in Computer Science and Information Engineering from National Taiwan University, Taiwan (R.O.C), in 2004 and 2006, respectively, where he received the Ph.D. degree advised by Prof. Rama Chellappa in Computer Science from University of Maryland, College Park, USA, in 2016. From 2017 to 2019, he was a postdoctoral research fellow at University of Maryland Institute for Advanced Computer Studies. His research interests include computer vision, machine learning, deep learning and their applications to biometrics, such as face recognition/facial analytics, activity recognition/detection in the visual surveillance domain, deepfake detection in the forensics, text-to-image generation in the visual generative AI, etc. He also serves as an associate editor of IEEE Transactions on Multimedia. He was a recipient of the ACM Multimedia Best Technical Full Paper Award in 2006, and APSIPA ASC Best Paper Award in 2023, and IEEE CE Magazine Best Paper Award in 2025.