Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chong Wang

Princeton University

Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Aug 19, 2024

Puning Zhao, Jiafei Wu, Zhe Liu, Chong Wang, Rongfei Fan, Qingming Li

Figure 1 for Differential Private Stochastic Optimization with Heavy-tailed Data: Towards Optimal Rates

Abstract:We study convex optimization problems under differential privacy (DP). With heavy-tailed gradients, existing works achieve suboptimal rates. The main obstacle is that existing gradient estimators have suboptimal tail properties, resulting in a superfluous factor of $d$ in the union bound. In this paper, we explore algorithms achieving optimal rates of DP optimization with heavy-tailed gradients. Our first method is a simple clipping approach. Under bounded $p$-th order moments of gradients, with $n$ samples, it achieves $\tilde{O}(\sqrt{d/n}+\sqrt{d}(\sqrt{d}/n\epsilon)^{1-1/p})$ population risk with $\epsilon\leq 1/\sqrt{d}$. We then propose an iterative updating method, which is more complex but achieves this rate for all $\epsilon\leq 1$. The results significantly improve over existing methods. Such improvement relies on a careful treatment of the tail behavior of gradient estimators. Our results match the minimax lower bound in \cite{kamath2022improved}, indicating that the theoretical limit of stochastic convex optimization under DP is achievable.

Via

Access Paper or Ask Questions

Apple Intelligence Foundation Language Models

Jul 29, 2024

Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu(+144 more)

Figure 1 for Apple Intelligence Foundation Language Models

Figure 2 for Apple Intelligence Foundation Language Models

Figure 3 for Apple Intelligence Foundation Language Models

Figure 4 for Apple Intelligence Foundation Language Models

Abstract:We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development.

Via

Access Paper or Ask Questions

Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey

Jul 11, 2024

Laniqng Guo, Chong Wang, Yufei Wang, Siyu Huang, Wenhan Yang, Alex C. Kot, Bihan Wen

Figure 1 for Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey

Figure 2 for Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey

Figure 3 for Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey

Figure 4 for Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey

Abstract:Shadow removal aims at restoring the image content within shadow regions, pursuing a uniform distribution of illumination that is consistent between shadow and non-shadow regions. {Comparing to other image restoration tasks, there are two unique challenges in shadow removal:} 1) The patterns of shadows are arbitrary, varied, and often have highly complex trace structures, making ``trace-less'' image recovery difficult. 2) The degradation caused by shadows is spatially non-uniform, resulting in inconsistencies in illumination and color between shadow and non-shadow areas. Recent developments in this field are primarily driven by deep learning-based solutions, employing a variety of learning strategies, network architectures, loss functions, and training data. Nevertheless, a thorough and insightful review of deep learning-based shadow removal techniques is still lacking. In this paper, we are the first to provide a comprehensive survey to cover various aspects ranging from technical details to applications. We highlight the major advancements in deep learning-based single-image shadow removal methods, thoroughly review previous research across various categories, and provide insights into the historical progression of these developments. Additionally, we summarize performance comparisons both quantitatively and qualitatively. Beyond the technical aspects of shadow removal methods, we also explore potential future directions for this field.

* url: https://github.com/GuoLanqing/Awesome-Shadow-Removal

Via

Access Paper or Ask Questions

CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Jul 07, 2024

Yuanhong Chen, Chong Wang, Yuyuan Liu, Hu Wang, Gustavo Carneiro

Figure 1 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 2 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 3 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Figure 4 for CPM: Class-conditional Prompting Machine for Audio-visual Segmentation

Abstract:Audio-visual segmentation (AVS) is an emerging task that aims to accurately segment sounding objects based on audio-visual cues. The success of AVS learning systems depends on the effectiveness of cross-modal interaction. Such a requirement can be naturally fulfilled by leveraging transformer-based segmentation architecture due to its inherent ability to capture long-range dependencies and flexibility in handling different modalities. However, the inherent training issues of transformer-based methods, such as the low efficacy of cross-attention and unstable bipartite matching, can be amplified in AVS, particularly when the learned audio query does not provide a clear semantic clue. In this paper, we address these two issues with the new Class-conditional Prompting Machine (CPM). CPM improves the bipartite matching with a learning strategy combining class-agnostic queries with class-conditional queries. The efficacy of cross-modal attention is upgraded with new learning objectives for the audio, visual and joint modalities. We conduct experiments on AVS benchmarks, demonstrating that our method achieves state-of-the-art (SOTA) segmentation accuracy.

Via

Access Paper or Ask Questions

Cephalometric Landmark Detection across Ages with Prototypical Network

Jun 18, 2024

Han Wu, Chong Wang, Lanzhuju Mei, Tong Yang, Min Zhu, Dingggang Shen, Zhiming Cui

Figure 1 for Cephalometric Landmark Detection across Ages with Prototypical Network

Figure 2 for Cephalometric Landmark Detection across Ages with Prototypical Network

Figure 3 for Cephalometric Landmark Detection across Ages with Prototypical Network

Figure 4 for Cephalometric Landmark Detection across Ages with Prototypical Network

Abstract:Automated cephalometric landmark detection is crucial in real-world orthodontic diagnosis. Current studies mainly focus on only adult subjects, neglecting the clinically crucial scenario presented by adolescents whose landmarks often exhibit significantly different appearances compared to adults. Hence, an open question arises about how to develop a unified and effective detection algorithm across various age groups, including adolescents and adults. In this paper, we propose CeLDA, the first work for Cephalometric Landmark Detection across Ages. Our method leverages a prototypical network for landmark detection by comparing image features with landmark prototypes. To tackle the appearance discrepancy of landmarks between age groups, we design new strategies for CeLDA to improve prototype alignment and obtain a holistic estimation of landmark prototypes from a large set of training images. Moreover, a novel prototype relation mining paradigm is introduced to exploit the anatomical relations between the landmark prototypes. Extensive experiments validate the superiority of CeLDA in detecting cephalometric landmarks on both adult and adolescent subjects. To our knowledge, this is the first effort toward developing a unified solution and dataset for cephalometric landmark detection across age groups. Our code and dataset will be made public on https://github.com/ShanghaiTech-IMPACT/Cephalometric-Landmark-Detection-across-Ages-with-Prototypical-Network

* MICCAI 2024

Via

Access Paper or Ask Questions

Encryption in ghost imaging with Kronecker products of random matrices

May 30, 2024

Yi-Ning Zhao, Lin-Shan Chen, Lingxin Kong, Chong Wang, Cheng Ren, De-Zhong Cao

Abstract:By forming measurement matrices with the Kronecker product of two random matrices, image encryption in computational ghost imaging is investigated. The two-dimensional images are conveniently reconstructed with the pseudo-inverse matrices of the two random matrices. To suppress the noise, the method of truncated singular value decomposition can be applied to either or both of the two pseudo-inverse matrices. Further, our proposal facilitates for image encryption since more matrices can be involved in forming the measurement matrix. Two permutation matrices are inserted into the matrix sequence. The image information can only be reconstructed with the correct permutation matrices and the matrix sequence in image decryption. The experimental results show the facilitations our proposal. The technique paves the way for the practicality and flexibility of computational ghost imaging.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

May 24, 2024

Lichuan Ji, Yingqi Lin, Zhenhua Huang, Yan Han, Xiaogang Xu, Jiafei Wu, Chong Wang, Zhe Liu

Figure 1 for Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

Figure 2 for Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

Figure 3 for Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

Figure 4 for Distinguish Any Fake Videos: Unleashing the Power of Large-scale Data and Motion Features

Abstract:The development of AI-Generated Content (AIGC) has empowered the creation of remarkably realistic AI-generated videos, such as those involving Sora. However, the widespread adoption of these models raises concerns regarding potential misuse, including face video scams and copyright disputes. Addressing these concerns requires the development of robust tools capable of accurately determining video authenticity. The main challenges lie in the dataset and neural classifier for training. Current datasets lack a varied and comprehensive repository of real and generated content for effective discrimination. In this paper, we first introduce an extensive video dataset designed specifically for AI-Generated Video Detection (GenVidDet). It includes over 2.66 M instances of both real and generated videos, varying in categories, frames per second, resolutions, and lengths. The comprehensiveness of GenVidDet enables the training of a generalizable video detector. We also present the Dual-Branch 3D Transformer (DuB3D), an innovative and effective method for distinguishing between real and generated videos, enhanced by incorporating motion information alongside visual appearance. DuB3D utilizes a dual-branch architecture that adaptively leverages and fuses raw spatio-temporal data and optical flow. We systematically explore the critical factors affecting detection performance, achieving the optimal configuration for DuB3D. Trained on GenVidDet, DuB3D can distinguish between real and generated video content with 96.77% accuracy, and strong generalization capability even for unseen types.

Via

Access Paper or Ask Questions

Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

May 23, 2024

Yihao Huang, Chong Wang, Xiaojun Jia, Qing Guo, Felix Juefei-Xu, Jian Zhang, Geguang Pu, Yang Liu

Abstract:With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance. Regarding the new task of universal goal hijacking, previous efforts have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To fill this gap, we propose a universal goal hijacking method called POUGH that incorporates semantic-guided prompt processing strategies. Specifically, the method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes the prompts. Once the prompts are organized sequentially, the method employs an iterative optimization algorithm to generate the universal fixed suffix for the prompts. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness of our method.

* 15 pages

Via

Access Paper or Ask Questions

Computational ghost imaging with hybrid transforms by integrating Hadamard, discrete cosine, and Haar matrices

May 06, 2024

Yi-Ning Zhao, Lin-Shan Chen, Liu-Ya Chen, Lingxin Kong, Chong Wang, Cheng Ren, Su-Heng Zhang, De-Zhong Cao

Abstract:A scenario of ghost imaging with hybrid transform approach is proposed by integrating Hadamard, discrete cosine, and Haar matrices. The measurement matrix is formed by the Kronecker product of the two different transform matrices. The image information can be conveniently reconstructed by the corresponding inverse matrices. In experiment, six hybridization sets are performed in computational ghost imaging. For an object of staggered stripes, only one bucket signal survives in the Hadamard-cosine, Haar-Hadamard, and Haar-cosine hybridization sets, demonstrating flexible image compression. For a handmade windmill object, the quality factors of the reconstructed images vary with the hybridization sets. Sub-Nyquist sampling can be applied to either or both of the different transform matrices in each hybridization set in experiment. The hybridization method can be extended to apply more transforms at once. Ghost imaging with hybrid transforms may find flexible applications in image processing, such as image compression and image encryption.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

May 06, 2024

Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He

Abstract:Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of natural videos. To overcome these issues, this paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets. Specifically, during the fMRI-to-feature stage, we decouple semantic, structural, and motion features from fMRI through fMRI-vision-language tri-modal contrastive learning and sparse causal attention. In the feature-to-video stage, these features are merged to videos by an inflated Stable Diffusion. We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model, through permutation tests. Additionally, the visualization of voxel-wise and ROI-wise importance maps confirms the neurobiological interpretability of our model.

Via

Access Paper or Ask Questions