Alert button
Picture for Cheng Zhang

Cheng Zhang

Alert button

ITI-GEN: Inclusive Text-to-Image Generation

Sep 11, 2023
Cheng Zhang, Xuanbai Chen, Siqi Chai, Chen Henry Wu, Dmitry Lagun, Thabo Beeler, Fernando De la Torre

Figure 1 for ITI-GEN: Inclusive Text-to-Image Generation
Figure 2 for ITI-GEN: Inclusive Text-to-Image Generation
Figure 3 for ITI-GEN: Inclusive Text-to-Image Generation
Figure 4 for ITI-GEN: Inclusive Text-to-Image Generation

Text-to-image generative models often reflect the biases of the training data, leading to unequal representations of underrepresented groups. This study investigates inclusive text-to-image generative models that generate images based on human-written prompts and ensure the resulting images are uniformly distributed across attributes of interest. Unfortunately, directly expressing the desired attributes in the prompt often leads to sub-optimal results due to linguistic ambiguity or model misrepresentation. Hence, this paper proposes a drastically different approach that adheres to the maxim that "a picture is worth a thousand words". We show that, for some attributes, images can represent concepts more expressively than text. For instance, categories of skin tones are typically hard to specify by text but can be easily represented by example images. Building upon these insights, we propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration. The key idea is learning a set of prompt embeddings to generate images that can effectively represent all desired attribute categories. More importantly, ITI-GEN requires no model fine-tuning, making it computationally efficient to augment existing text-to-image models. Extensive experiments demonstrate that ITI-GEN largely improves over state-of-the-art models to generate inclusive images from a prompt. Project page: https://czhang0528.github.io/iti-gen.

* Accepted to ICCV 2023 (Oral Presentation) 
Viaarxiv icon

ProAgent: Building Proactive Cooperative AI with Large Language Models

Aug 28, 2023
Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, Yaodong Yang

Figure 1 for ProAgent: Building Proactive Cooperative AI with Large Language Models
Figure 2 for ProAgent: Building Proactive Cooperative AI with Large Language Models
Figure 3 for ProAgent: Building Proactive Cooperative AI with Large Language Models
Figure 4 for ProAgent: Building Proactive Cooperative AI with Large Language Models

Building AIs with adaptive behaviors in human-AI cooperation stands as a pivotal focus in AGI research. Current methods for developing cooperative agents predominantly rely on learning-based methods, where policy generalization heavily hinges on past interactions with specific teammates. These approaches constrain the agent's capacity to recalibrate its strategy when confronted with novel teammates. We propose \textbf{ProAgent}, a novel framework that harnesses large language models (LLMs) to fashion a \textit{pro}active \textit{agent} empowered with the ability to anticipate teammates' forthcoming decisions and formulate enhanced plans for itself. ProAgent excels at cooperative reasoning with the capacity to dynamically adapt its behavior to enhance collaborative efforts with teammates. Moreover, the ProAgent framework exhibits a high degree of modularity and interpretability, facilitating seamless integration to address a wide array of coordination scenarios. Experimental evaluations conducted within the framework of \textit{Overcook-AI} unveil the remarkable performance superiority of ProAgent, outperforming five methods based on self-play and population-based training in cooperation with AI agents. Further, when cooperating with human proxy models, its performance exhibits an average improvement exceeding 10\% compared to the current state-of-the-art, COLE. The advancement was consistently observed across diverse scenarios involving interactions with both AI agents of varying characteristics and human counterparts. These findings inspire future research for human-robot collaborations. For a hands-on demonstration, please visit \url{https://pku-proagent.github.io}.

Viaarxiv icon

Semi-Implicit Variational Inference via Score Matching

Aug 19, 2023
Longlin Yu, Cheng Zhang

Figure 1 for Semi-Implicit Variational Inference via Score Matching
Figure 2 for Semi-Implicit Variational Inference via Score Matching
Figure 3 for Semi-Implicit Variational Inference via Score Matching
Figure 4 for Semi-Implicit Variational Inference via Score Matching

Semi-implicit variational inference (SIVI) greatly enriches the expressiveness of variational families by considering implicit variational distributions defined in a hierarchical manner. However, due to the intractable densities of variational distributions, current SIVI approaches often use surrogate evidence lower bounds (ELBOs) or employ expensive inner-loop MCMC runs for unbiased ELBOs for training. In this paper, we propose SIVI-SM, a new method for SIVI based on an alternative training objective via score matching. Leveraging the hierarchical structure of semi-implicit variational families, the score matching objective allows a minimax formulation where the intractable variational densities can be naturally handled with denoising score matching. We show that SIVI-SM closely matches the accuracy of MCMC and outperforms ELBO-based SIVI methods in a variety of Bayesian inference tasks.

* 17 pages, 8 figures; ICLR 2023 
Viaarxiv icon

Incremental Collaborative Beam Alignment for Millimeter Wave Cell-Free MIMO Systems

Aug 16, 2023
Cheng Zhang, Leming Chen, Lujia Zhang, Yongming Huang, Wei Zhang

Figure 1 for Incremental Collaborative Beam Alignment for Millimeter Wave Cell-Free MIMO Systems
Figure 2 for Incremental Collaborative Beam Alignment for Millimeter Wave Cell-Free MIMO Systems
Figure 3 for Incremental Collaborative Beam Alignment for Millimeter Wave Cell-Free MIMO Systems
Figure 4 for Incremental Collaborative Beam Alignment for Millimeter Wave Cell-Free MIMO Systems

Millimeter wave (mmWave) cell-free MIMO achieves an extremely high rate while its beam alignment (BA) suffers from excessive overhead due to a large number of transceivers. Recently, user location and probing measurements are utilized for BA based on machine learning (ML) models, e.g., deep neural network (DNN). However, most of these ML models are centralized with high communication and computational overhead and give no specific consideration to practical issues, e.g., limited training data and real-time model updates. In this paper, we study the {probing} beam-based BA for mmWave cell-free MIMO downlink with the help of broad learning (BL). For channels without and with uplink-downlink reciprocity, we propose the user-side and base station (BS)-side BL-aided incremental collaborative BA approaches. Via transforming the centralized BL into a distributed learning with data and feature splitting respectively, the user-side and BS-side schemes realize implicit sharing of multiple user data and multiple BS features. Simulations confirm that the user-side scheme is applicable to fast time-varying and/or non-stationary channels, while the BS-side scheme is suitable for systems with low-bandwidth fronthaul links and a central unit with limited computing power. The advantages of proposed schemes are also demonstrated compared to traditional and DNN-aided BA schemes.

* 15 pages, 15 figures, to appear in the IEEE Transactions on Communications, 2023 
Viaarxiv icon

All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation

Aug 06, 2023
Cheng Zhang, Yu Zhu, Qingsen Yan, Jinqiu Sun, Yanning Zhang

Figure 1 for All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation
Figure 2 for All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation
Figure 3 for All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation
Figure 4 for All-in-one Multi-degradation Image Restoration Network via Hierarchical Degradation Representation

The aim of image restoration is to recover high-quality images from distorted ones. However, current methods usually focus on a single task (\emph{e.g.}, denoising, deblurring or super-resolution) which cannot address the needs of real-world multi-task processing, especially on mobile devices. Thus, developing an all-in-one method that can restore images from various unknown distortions is a significant challenge. Previous works have employed contrastive learning to learn the degradation representation from observed images, but this often leads to representation drift caused by deficient positive and negative pairs. To address this issue, we propose a novel All-in-one Multi-degradation Image Restoration Network (AMIRNet) that can effectively capture and utilize accurate degradation representation for image restoration. AMIRNet learns a degradation representation for unknown degraded images by progressively constructing a tree structure through clustering, without any prior knowledge of degradation information. This tree-structured representation explicitly reflects the consistency and discrepancy of various distortions, providing a specific clue for image restoration. To further enhance the performance of the image restoration network and overcome domain gaps caused by unknown distortions, we design a feature transform block (FTB) that aligns domains and refines features with the guidance of the degradation representation. We conduct extensive experiments on multiple distorted datasets, demonstrating the effectiveness of our method and its advantages over state-of-the-art restoration methods both qualitatively and quantitatively.

* ACMMM23 
Viaarxiv icon

Interleaved Training for Massive MIMO Downlink via Exploring Spatial Correlation

Jul 31, 2023
Cheng Zhang, Chang Liu, Yindi Jing, Minjie Ding, Yongming Huang

Figure 1 for Interleaved Training for Massive MIMO Downlink via Exploring Spatial Correlation
Figure 2 for Interleaved Training for Massive MIMO Downlink via Exploring Spatial Correlation
Figure 3 for Interleaved Training for Massive MIMO Downlink via Exploring Spatial Correlation
Figure 4 for Interleaved Training for Massive MIMO Downlink via Exploring Spatial Correlation

Interleaved training has been studied for single-user and multi-user massive MIMO downlink with either fully-digital or hybrid beamforming. However, the impact of channel correlation on its average training overhead is rarely addressed. In this paper, we explore the channel correlation to improve the interleaved training for single-user massive MIMO downlink. For the beam-domain interleaved training, we propose a modified scheme by optimizing the beam training codebook. The basic antenna-domain interleaved training is also improved by dynamically adjusting the training order of the base station (BS) antennas during the training process based on the values of the already trained channels. Exact and simplified approximate expressions of the average training length are derived in closed-form for the basic and modified beam-domain schemes and the basic antenna-domain scheme in correlated channels. For the modified antenna-domain scheme, a deep neural network (DNN)-based approximation is provided for fast performance evaluation. Analytical results and simulations verify the accuracy of our derived training length expressions and explicitly reveal the impact of system parameters on the average training length. In addition, the modified beam/antenna-domain schemes are shown to have a shorter average training length compared to the basic schemes.

* 13 pages (double column), 8 figures. The paper has been submitted to IEEE journal for possible publication 
Viaarxiv icon

BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery

Jul 26, 2023
Yashas Annadani, Nick Pawlowski, Joel Jennings, Stefan Bauer, Cheng Zhang, Wenbo Gong

Figure 1 for BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery
Figure 2 for BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery
Figure 3 for BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery
Figure 4 for BayesDAG: Gradient-Based Posterior Sampling for Causal Discovery

Bayesian causal discovery aims to infer the posterior distribution over causal models from observed data, quantifying epistemic uncertainty and benefiting downstream tasks. However, computational challenges arise due to joint inference over combinatorial space of Directed Acyclic Graphs (DAGs) and nonlinear functions. Despite recent progress towards efficient posterior inference over DAGs, existing methods are either limited to variational inference on node permutation matrices for linear causal models, leading to compromised inference accuracy, or continuous relaxation of adjacency matrices constrained by a DAG regularizer, which cannot ensure resulting graphs are DAGs. In this work, we introduce a scalable Bayesian causal discovery framework based on stochastic gradient Markov Chain Monte Carlo (SG-MCMC) that overcomes these limitations. Our approach directly samples DAGs from the posterior without requiring any DAG regularization, simultaneously draws function parameter samples and is applicable to both linear and nonlinear causal models. To enable our approach, we derive a novel equivalence to the permutation-based DAG learning, which opens up possibilities of using any relaxed gradient estimator defined over permutations. To our knowledge, this is the first framework applying gradient-based MCMC sampling for causal discovery. Empirical evaluations on synthetic and real-world datasets demonstrate our approach's effectiveness compared to state-of-the-art baselines.

Viaarxiv icon

Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO

Jul 20, 2023
Cheng Zhang, Pengguang Du, Minjie Ding, Yindi Jing, Yongming Huang

Figure 1 for Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO
Figure 2 for Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO
Figure 3 for Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO
Figure 4 for Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO

In frequency division duplexing (FDD) cell-free massive MIMO, the acquisition of the channel state information (CSI) is very challenging because of the large overhead required for the training and feedback of the downlink channels of multiple cooperating base stations (BSs). In this paper, for systems with partial uplink-downlink channel reciprocity, and a general spatial domain channel model with variations in the average port power and correlation among port coefficients, we propose a joint-port-selection-based CSI acquisition and feedback scheme for the downlink transmission with zero-forcing precoding. The scheme uses an eigenvalue-decomposition-based transformation to reduce the feedback overhead by exploring the port correlation. We derive the sum-rate of the system for any port selection. Based on the sum-rate result, we propose a low-complexity greedy-search-based joint port selection (GS-JPS) algorithm. Moreover, to adapt to fast time-varying scenarios, a supervised deep learning-enhanced joint port selection (DL-JPS) algorithm is proposed. Simulations verify the effectiveness of our proposed schemes and their advantage over existing port-selection channel acquisition schemes.

* 30 pages, 9 figures. The paper has been submitted to IEEE journal for possible publication 
Viaarxiv icon

Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices

Jun 30, 2023
Yin Li, Rohan Reddy, Cheng Zhang, Rajalakshmi Nandakumar

Figure 1 for Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices
Figure 2 for Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices
Figure 3 for Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices
Figure 4 for Beyond-Voice: Towards Continuous 3D Hand Pose Tracking on Commercial Home Assistant Devices

Increasingly popular home assistants are widely utilized as the central controller for smart home devices. However, current designs heavily rely on voice interfaces with accessibility and usability issues; some latest ones are equipped with additional cameras and displays, which are costly and raise privacy concerns. These concerns jointly motivate Beyond-Voice, a novel deep-learning-driven acoustic sensing system that allows commodity home assistant devices to track and reconstruct hand poses continuously. It transforms the home assistant into an active sonar system using its existing onboard microphones and speakers. We feed a high-resolution range profile to the deep learning model that can analyze the motions of multiple body parts and predict the 3D positions of 21 finger joints, bringing the granularity for acoustic hand tracking to the next level. It operates across different environments and users without the need for personalized training data. A user study with 11 participants in 3 different environments shows that Beyond-Voice can track joints with an average mean absolute error of 16.47mm without any training data provided by the testing subject.

Viaarxiv icon