Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Chen

Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Shanghai, China

Cross-View Graph Consistency Learning for Invariant Graph Representations

Nov 20, 2023

Jie Chen, Zhiming Li, Hua Mao, Wai Lok Woo, Xi Peng

Abstract:Graph representation learning is fundamental for analyzing graph-structured data. Exploring invariant graph representations remains a challenge for most existing graph representation learning methods. In this paper, we propose a cross-view graph consistency learning (CGCL) method that learns invariant graph representations for link prediction. First, two complementary augmented views are derived from an incomplete graph structure through a bidirectional graph structure augmentation scheme. This augmentation scheme mitigates the potential information loss that is commonly associated with various data augmentation techniques involving raw graph data, such as edge perturbation, node removal, and attribute masking. Second, we propose a CGCL model that can learn invariant graph representations. A cross-view training scheme is proposed to train the proposed CGCL model. This scheme attempts to maximize the consistency information between one augmented view and the graph structure reconstructed from the other augmented view. Furthermore, we offer a comprehensive theoretical CGCL analysis. This paper empirically and experimentally demonstrates the effectiveness of the proposed CGCL method, achieving competitive results on graph datasets in comparisons with several state-of-the-art algorithms.

* 8 pages

Via

Access Paper or Ask Questions

Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

Nov 13, 2023

Zilu Tang, Mayank Agarwal, Alex Shypula, Bailin Wang, Derry Wijaya, Jie Chen, Yoon Kim

Figure 1 for Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

Figure 2 for Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

Figure 3 for Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

Figure 4 for Explain-then-Translate: An Analysis on Improving Program Translation with Self-generated Explanations

Abstract:This work explores the use of self-generated natural language explanations as an intermediate step for code-to-code translation with language models. Across three types of explanations and 19 programming languages constructed from the MultiPL-E dataset, we find the explanations to be particularly effective in the zero-shot case, improving performance by 12% on average. Improvements with natural language explanations are particularly pronounced on difficult programs. We release our dataset, code, and canonical solutions in all 19 languages.

* 9 pages, 4 figures, 5 tables, 48 pages total. To be published in EMNLP Findings 2023

Via

Access Paper or Ask Questions

Distributed pressure matching strategy using diffusion adaptation

Nov 13, 2023

Mengfei Zhang, Junqing Zhang, Jie Chen, Cédric Richard

Abstract:Personal sound zone (PSZ) systems, which aim to create listening (bright) and silent (dark) zones in neighboring regions of space, are often based on time-varying acoustics. Conventional adaptive-based methods for handling PSZ tasks suffer from the collection and processing of acoustic transfer functions~(ATFs) between all the matching microphones and all the loudspeakers in a centralized manner, resulting in high calculation complexity and costly accuracy requirements. This paper presents a distributed pressure-matching (PM) method relying on diffusion adaptation (DPM-D) to spread the computational load amongst nodes in order to overcome these issues. The global PM problem is defined as a sum of local costs, and the diffusion adaption approach is then used to create a distributed solution that just needs local information exchanges. Simulations over multi-frequency bins and a computational complexity analysis are conducted to evaluate the properties of the algorithm and to compare it with centralized counterparts.

Via

Access Paper or Ask Questions

Generative Face Video Coding Techniques and Standardization Efforts: A Review

Nov 05, 2023

Bolin Chen, Jie Chen, Shiqi Wang, Yan Ye

Figure 1 for Generative Face Video Coding Techniques and Standardization Efforts: A Review

Figure 2 for Generative Face Video Coding Techniques and Standardization Efforts: A Review

Figure 3 for Generative Face Video Coding Techniques and Standardization Efforts: A Review

Figure 4 for Generative Face Video Coding Techniques and Standardization Efforts: A Review

Abstract:Generative Face Video Coding (GFVC) techniques can exploit the compact representation of facial priors and the strong inference capability of deep generative models, achieving high-quality face video communication in ultra-low bandwidth scenarios. This paper conducts a comprehensive survey on the recent advances of the GFVC techniques and standardization efforts, which could be applicable to ultra low bitrate communication, user-specified animation/filtering and metaverse-related functionalities. In particular, we generalize GFVC systems within one coding framework and summarize different GFVC algorithms with their corresponding visual representations. Moreover, we review the GFVC standardization activities that are specified with supplemental enhancement information messages. Finally, we discuss fundamental challenges and broad applications on GFVC techniques and their standardization potentials, as well as envision their future trends. The project page can be found at https://github.com/Berlin0610/Awesome-Generative-Face-Video-Coding.

Via

Access Paper or Ask Questions

Changes-Aware Transformer: Learning Generalized Changes Representation

Sep 24, 2023

Dan Wang, Licheng Jiao, Jie Chen, Shuyuan Yang, Fang Liu

Abstract:Difference features obtained by comparing the images of two periods play an indispensable role in the change detection (CD) task. However, a pair of bi-temporal images can exhibit diverse changes, which may cause various difference features. Identifying changed pixels with differ difference features to be the same category is thus a challenge for CD. Most nowadays' methods acquire distinctive difference features in implicit ways like enhancing image representation or supervision information. Nevertheless, informative image features only guarantee object semantics are modeled and can not guarantee that changed pixels have similar semantics in the difference feature space and are distinct from those unchanged ones. In this work, the generalized representation of various changes is learned straightforwardly in the difference feature space, and a novel Changes-Aware Transformer (CAT) for refining difference features is proposed. This generalized representation can perceive which pixels are changed and which are unchanged and further guide the update of pixels' difference features. CAT effectively accomplishes this refinement process through the stacked cosine cross-attention layer and self-attention layer. After refinement, the changed pixels in the difference feature space are closer to each other, which facilitates change detection. In addition, CAT is compatible with various backbone networks and existing CD methods. Experiments on remote sensing CD data set and street scene CD data set show that our method achieves state-of-the-art performance and has excellent generalization.

Via

Access Paper or Ask Questions

Towards Real-World Burst Image Super-Resolution: Benchmark and Method

Sep 09, 2023

Pengxu Wei, Yujing Sun, Xingbei Guo, Chang Liu, Jie Chen, Xiangyang Ji, Liang Lin

Figure 1 for Towards Real-World Burst Image Super-Resolution: Benchmark and Method

Figure 2 for Towards Real-World Burst Image Super-Resolution: Benchmark and Method

Figure 3 for Towards Real-World Burst Image Super-Resolution: Benchmark and Method

Figure 4 for Towards Real-World Burst Image Super-Resolution: Benchmark and Method

Abstract:Despite substantial advances, single-image super-resolution (SISR) is always in a dilemma to reconstruct high-quality images with limited information from one input image, especially in realistic scenarios. In this paper, we establish a large-scale real-world burst super-resolution dataset, i.e., RealBSR, to explore the faithful reconstruction of image details from multiple frames. Furthermore, we introduce a Federated Burst Affinity network (FBAnet) to investigate non-trivial pixel-wise displacements among images under real-world image degradation. Specifically, rather than using pixel-wise alignment, our FBAnet employs a simple homography alignment from a structural geometry aspect and a Federated Affinity Fusion (FAF) strategy to aggregate the complementary information among frames. Those fused informative representations are fed to a Transformer-based module of burst representation decoding. Besides, we have conducted extensive experiments on two versions of our datasets, i.e., RealBSR-RAW and RealBSR-RGB. Experimental results demonstrate that our FBAnet outperforms existing state-of-the-art burst SR methods and also achieves visually-pleasant SR image predictions with model details. Our dataset, codes, and models are publicly available at https://github.com/yjsunnn/FBANet.

* Accepted by ICCV2023

Via

Access Paper or Ask Questions

Monotone Tree-Based GAMI Models by Adapting XGBoost

Sep 05, 2023

Linwei Hu, Soroush Aramideh, Jie Chen, Vijayan N. Nair

Figure 1 for Monotone Tree-Based GAMI Models by Adapting XGBoost

Figure 2 for Monotone Tree-Based GAMI Models by Adapting XGBoost

Figure 3 for Monotone Tree-Based GAMI Models by Adapting XGBoost

Figure 4 for Monotone Tree-Based GAMI Models by Adapting XGBoost

Abstract:Recent papers have used machine learning architecture to fit low-order functional ANOVA models with main effects and second-order interactions. These GAMI (GAM + Interaction) models are directly interpretable as the functional main effects and interactions can be easily plotted and visualized. Unfortunately, it is not easy to incorporate the monotonicity requirement into the existing GAMI models based on boosted trees, such as EBM (Lou et al. 2013) and GAMI-Lin-T (Hu et al. 2022). This paper considers models of the form $f(x)=\sum_{j,k}f_{j,k}(x_j, x_k)$ and develops monotone tree-based GAMI models, called monotone GAMI-Tree, by adapting the XGBoost algorithm. It is straightforward to fit a monotone model to $f(x)$ using the options in XGBoost. However, the fitted model is still a black box. We take a different approach: i) use a filtering technique to determine the important interactions, ii) fit a monotone XGBoost algorithm with the selected interactions, and finally iii) parse and purify the results to get a monotone GAMI model. Simulated datasets are used to demonstrate the behaviors of mono-GAMI-Tree and EBM, both of which use piecewise constant fits. Note that the monotonicity requirement is for the full model. Under certain situations, the main effects will also be monotone. But, as seen in the examples, the interactions will not be monotone.

* 12 pages

Via

Access Paper or Ask Questions

Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction

Sep 04, 2023

Minghao Guo, Veronika Thost, Samuel W Song, Adithya Balachandran, Payel Das, Jie Chen, Wojciech Matusik

Abstract:The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data. Code is available at https://github.com/gmh14/Geo-DEG.

* 22 pages, 10 figures; ICML 2023

Via

Access Paper or Ask Questions

LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Aug 31, 2023

Jie Chen, Xingchen Song, Zhendong Peng, Binbin Zhang, Fuping Pan, Zhiyong Wu

Figure 1 for LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Figure 2 for LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Figure 3 for LightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech

Abstract:Recent advances in neural text-to-speech (TTS) models bring thousands of TTS applications into daily life, where models are deployed in cloud to provide services for customs. Among these models are diffusion probabilistic models (DPMs), which can be stably trained and are more parameter-efficient compared with other generative models. As transmitting data between customs and the cloud introduces high latency and the risk of exposing private data, deploying TTS models on edge devices is preferred. When implementing DPMs onto edge devices, there are two practical problems. First, current DPMs are not lightweight enough for resource-constrained devices. Second, DPMs require many denoising steps in inference, which increases latency. In this work, we present LightGrad, a lightweight DPM for TTS. LightGrad is equipped with a lightweight U-Net diffusion decoder and a training-free fast sampling technique, reducing both model parameters and inference latency. Streaming inference is also implemented in LightGrad to reduce latency further. Compared with Grad-TTS, LightGrad achieves 62.2% reduction in paramters, 65.7% reduction in latency, while preserving comparable speech quality on both Chinese Mandarin and English in 4 denoising steps.

* Accepted by ICASSP 2023

Via

Access Paper or Ask Questions

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Aug 31, 2023

Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

Figure 1 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Figure 2 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Figure 3 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Figure 4 for Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

Abstract:For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech. Although inter-utterance linguistic information can influence the speech interpretation of the target utterance, previous works on PSP mainly focus on utilizing intrautterance linguistic information of the current utterance only. This work proposes to use inter-utterance linguistic information to improve the performance of PSP. Multi-level contextual information, which includes both inter-utterance and intrautterance linguistic information, is extracted by a hierarchical encoder from character level, utterance level and discourse level of the input text. Then a multi-task learning (MTL) decoder predicts prosodic boundaries from multi-level contextual information. Objective evaluation results on two datasets show that our method achieves better F1 scores in predicting prosodic word (PW), prosodic phrase (PPH) and intonational phrase (IPH). It demonstrates the effectiveness of using multi-level contextual information for PSP. Subjective preference tests also indicate the naturalness of synthesized speeches are improved.

* Accepted by Interspeech2022

Via

Access Paper or Ask Questions