Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiawei Zhang

Online Stochastic Optimization with Wasserstein Based Non-stationarity

Dec 23, 2020
Jiashuo Jiang, Xiaocheng Li, Jiawei Zhang

Figure 1 for Online Stochastic Optimization with Wasserstein Based Non-stationarity

Figure 2 for Online Stochastic Optimization with Wasserstein Based Non-stationarity

Figure 3 for Online Stochastic Optimization with Wasserstein Based Non-stationarity

Figure 4 for Online Stochastic Optimization with Wasserstein Based Non-stationarity

We consider a general online stochastic optimization problem with multiple budget constraints over a horizon of finite time periods. In each time period, a reward function and multiple cost functions are revealed, and the decision maker needs to specify an action from a convex and compact action set to collect the reward and consume the budget. Each cost function corresponds to the consumption of one budget. In each period, the reward and cost functions are drawn from an unknown distribution, which is non-stationary across time. The objective of the decision maker is to maximize the cumulative reward subject to the budget constraints. This formulation captures a wide range of applications including online linear programming and network revenue management, among others. In this paper, we consider two settings: (i) a data-driven setting where the true distribution is unknown but a prior estimate (possibly inaccurate) is available; (ii) an uninformative setting where the true distribution is completely unknown. We propose a unified Wasserstein-distance based measure to quantify the inaccuracy of the prior estimate in setting (i) and the non-stationarity of the system in setting (ii). We show that the proposed measure leads to a necessary and sufficient condition for the attainability of a sublinear regret in both settings. For setting (i), we propose a new algorithm, which takes a primal-dual perspective and integrates the prior information of the underlying distributions into an online gradient descent procedure in the dual space. The algorithm also naturally extends to the uninformative setting (ii). Under both settings, we show the corresponding algorithm achieves a regret of optimal order. In numerical experiments, we demonstrate how the proposed algorithms can be naturally integrated with the re-solving technique to further boost the empirical performance.

Via

Access Paper or Ask Questions

Distributed Stochastic Consensus Optimization with Momentum for Nonconvex Nonsmooth Problems

Nov 10, 2020
Zhiguo Wang, Jiawei Zhang, Tsung-Hui Chang, Jian Li, Zhi-Quan Luo

Figure 1 for Distributed Stochastic Consensus Optimization with Momentum for Nonconvex Nonsmooth Problems

Figure 2 for Distributed Stochastic Consensus Optimization with Momentum for Nonconvex Nonsmooth Problems

Figure 3 for Distributed Stochastic Consensus Optimization with Momentum for Nonconvex Nonsmooth Problems

Figure 4 for Distributed Stochastic Consensus Optimization with Momentum for Nonconvex Nonsmooth Problems

While many distributed optimization algorithms have been proposed for solving smooth or convex problems over the networks, few of them can handle non-convex and non-smooth problems. Based on a proximal primal-dual approach, this paper presents a new (stochastic) distributed algorithm with Nesterov momentum for accelerated optimization of non-convex and non-smooth problems. Theoretically, we show that the proposed algorithm can achieve an $\epsilon$-stationary solution under a constant step size with $\mathcal{O}(1/\epsilon^2)$ computation complexity and $\mathcal{O}(1/\epsilon)$ communication complexity. When compared to the existing gradient tracking based methods, the proposed algorithm has the same order of computation complexity but lower order of communication complexity. To the best of our knowledge, the presented result is the first stochastic algorithm with the $\mathcal{O}(1/\epsilon)$ communication complexity for non-convex and non-smooth problems. Numerical experiments for a distributed non-convex regression problem and a deep neural network based classification problem are presented to illustrate the effectiveness of the proposed algorithms.

Via

Access Paper or Ask Questions

AIM 2020 Challenge on Learned Image Signal Processing Pipeline

Nov 10, 2020
Andrey Ignatov, Radu Timofte, Zhilu Zhang, Ming Liu, Haolin Wang, Wangmeng Zuo, Jiawei Zhang, Ruimao Zhang, Zhanglin Peng, Sijie Ren, Linhui Dai, Xiaohong Liu, Chengqi Li, Jun Chen, Yuichi Ito, Bhavya Vasudeva, Puneesh Deora, Umapada Pal, Zhenyu Guo, Yu Zhu, Tian Liang, Chenghua Li, Cong Leng, Zhihong Pan, Baopu Li, Byung-Hoon Kim, Joonyoung Song, Jong Chul Ye, JaeHyun Baek, Magauiya Zhussip, Yeskendir Koishekenov, Hwechul Cho Ye, Xin Liu, Xueying Hu, Jun Jiang, Jinwei Gu, Kai Li, Pengliang Tan, Bingxin Hou

Figure 1 for AIM 2020 Challenge on Learned Image Signal Processing Pipeline

Figure 2 for AIM 2020 Challenge on Learned Image Signal Processing Pipeline

Figure 3 for AIM 2020 Challenge on Learned Image Signal Processing Pipeline

Figure 4 for AIM 2020 Challenge on Learned Image Signal Processing Pipeline

This paper reviews the second AIM learned ISP challenge and provides the description of the proposed solutions and results. The participating teams were solving a real-world RAW-to-RGB mapping problem, where to goal was to map the original low-quality RAW images captured by the Huawei P20 device to the same photos obtained with the Canon 5D DSLR camera. The considered task embraced a number of complex computer vision subtasks, such as image demosaicing, denoising, white balancing, color and contrast correction, demoireing, etc. The target metric used in this challenge combined fidelity scores (PSNR and SSIM) with solutions' perceptual results measured in a user study. The proposed solutions significantly improved the baseline results, defining the state-of-the-art for practical image signal processing pipeline modeling.

* Published in ECCV 2020 Workshops (Advances in Image Manipulation), https://data.vision.ee.ethz.ch/cvl/aim20/

Via

Access Paper or Ask Questions

A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Oct 29, 2020
Jiawei Zhang, Peijun Xiao, Ruoyu Sun, Zhi-Quan Luo

Figure 1 for A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Figure 2 for A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Figure 3 for A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Figure 4 for A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems

Nonconvex-concave min-max problem arises in many machine learning applications including minimizing a pointwise maximum of a set of nonconvex functions and robust adversarial training of neural networks. A popular approach to solve this problem is the gradient descent-ascent (GDA) algorithm which unfortunately can exhibit oscillation in case of nonconvexity. In this paper, we introduce a "smoothing" scheme which can be combined with GDA to stabilize the oscillation and ensure convergence to a stationary solution. We prove that the stabilized GDA algorithm can achieve an $O(1/\epsilon^2)$ iteration complexity for minimizing the pointwise maximum of a finite collection of nonconvex functions. Moreover, the smoothed GDA algorithm achieves an $O(1/\epsilon^4)$ iteration complexity for general nonconvex-concave problems. Extensions of this stabilized GDA algorithm to multi-block cases are presented. To the best of our knowledge, this is the first algorithm to achieve $O(1/\epsilon^2)$ for a class of nonconvex-concave problem. We illustrate the practical efficiency of the stabilized GDA algorithm on robust training.

Via

Access Paper or Ask Questions

Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Oct 08, 2020
Yizhu Jiao, Yun Xiong, Jiawei Zhang, Yao Zhang, Tianqi Zhang, Yangyong Zhu

Figure 1 for Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Figure 2 for Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Figure 3 for Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Figure 4 for Sub-graph Contrast for Scalable Self-Supervised Graph Representation Learning

Graph representation learning has attracted lots of attention recently. Existing graph neural networks fed with the complete graph data are not scalable due to limited computation and memory costs. Thus, it remains a great challenge to capture rich information in large-scale graph data. Besides, these methods mainly focus on supervised learning and highly depend on node label information, which is expensive to obtain in the real world. As to unsupervised network embedding approaches, they overemphasize node proximity instead, whose learned representations can hardly be used in downstream application tasks directly. In recent years, emerging self-supervised learning provides a potential solution to address the aforementioned problems. However, existing self-supervised works also operate on the complete graph data and are biased to fit either global or very local (1-hop neighborhood) graph structures in defining the mutual information based loss terms. In this paper, a novel self-supervised representation learning method via Subgraph Contrast, namely \textsc{Subg-Con}, is proposed by utilizing the strong correlation between central nodes and their sampled subgraphs to capture regional structure information. Instead of learning on the complete input graph data, with a novel data augmentation strategy, \textsc{Subg-Con} learns node representations through a contrastive loss defined based on subgraphs sampled from the original graph instead. Compared with existing graph representation learning approaches, \textsc{Subg-Con} has prominent performance advantages in weaker supervision requirements, model learning scalability, and parallelization. Extensive experiments verify both the effectiveness and the efficiency of our work compared with both classic and state-of-the-art graph representation learning approaches on multiple real-world large-scale benchmark datasets from different domains.

* To appear at ICDM 2020

Via

Access Paper or Ask Questions

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

Aug 27, 2020
Jianbo Liu, Junjun He, Jiawei Zhang, Jimmy S. Ren, Hongsheng Li

Figure 1 for EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

Figure 2 for EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

Figure 3 for EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

Figure 4 for EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

Both performance and efficiency are important to semantic segmentation. State-of-the-art semantic segmentation algorithms are mostly based on dilated Fully Convolutional Networks (dilatedFCN), which adopt dilated convolutions in the backbone networks to extract high-resolution feature maps for achieving high-performance segmentation performance. However, due to many convolution operations are conducted on the high-resolution feature maps, such dilatedFCN-based methods result in large computational complexity and memory consumption. To balance the performance and efficiency, there also exist encoder-decoder structures that gradually recover the spatial information by combining multi-level feature maps from the encoder. However, the performances of existing encoder-decoder methods are far from comparable with the dilatedFCN-based methods. In this paper, we propose the EfficientFCN, whose backbone is a common ImageNet pre-trained network without any dilated convolution. A holistically-guided decoder is introduced to obtain the high-resolution semantic-rich feature maps via the multi-scale features from the encoder. The decoding task is converted to novel codebook generation and codeword assembly task, which takes advantages of the high-level and low-level features from the encoder. Such a framework achieves comparable or even better performance than state-of-the-art methods with only 1/3 of the computational cost. Extensive experiments on PASCAL Context, PASCAL VOC, ADE20K validate the effectiveness of the proposed EfficientFCN.

* Accepted in ECCV 2020

Via

Access Paper or Ask Questions

Cross-Scale Internal Graph Neural Network for Image Super-Resolution

Jun 30, 2020
Shangchen Zhou, Jiawei Zhang, Wangmeng Zuo, Chen Change Loy

Figure 1 for Cross-Scale Internal Graph Neural Network for Image Super-Resolution

Figure 2 for Cross-Scale Internal Graph Neural Network for Image Super-Resolution

Figure 3 for Cross-Scale Internal Graph Neural Network for Image Super-Resolution

Figure 4 for Cross-Scale Internal Graph Neural Network for Image Super-Resolution

Non-local self-similarity in natural images has been well studied as an effective prior in image restoration. However, for single image super-resolution (SISR), most existing deep non-local methods (e.g., non-local neural networks) only exploit similar patches within the same scale of the low-resolution (LR) input image. Consequently, the restoration is limited to using the same-scale information while neglecting potential high-resolution (HR) cues from other scales. In this paper, we explore the cross-scale patch recurrence property of a natural image, i.e., similar patches tend to recur many times across different scales. This is achieved using a novel cross-scale internal graph neural network (IGNN). Specifically, we dynamically construct a cross-scale graph by searching k-nearest neighboring patches in the downsampled LR image for each query patch in the LR image. We then obtain the corresponding k HR neighboring patches in the LR image and aggregate them adaptively in accordance to the edge label of the constructed graph. In this way, the HR information can be passed from k HR neighboring patches to the LR query patch to help it recover more detailed textures. Besides, these internal image-specific LR/HR exemplars are also significant complements to the external information learned from the training dataset. Extensive experiments demonstrate the effectiveness of IGNN against the state-of-the-art SISR methods including existing non-local networks on standard benchmarks.

Via

Access Paper or Ask Questions

G5: A Universal GRAPH-BERT for Graph-to-Graph Transfer and Apocalypse Learning

Jun 11, 2020
Jiawei Zhang

Figure 1 for G5: A Universal GRAPH-BERT for Graph-to-Graph Transfer and Apocalypse Learning

Figure 2 for G5: A Universal GRAPH-BERT for Graph-to-Graph Transfer and Apocalypse Learning

Figure 3 for G5: A Universal GRAPH-BERT for Graph-to-Graph Transfer and Apocalypse Learning

Figure 4 for G5: A Universal GRAPH-BERT for Graph-to-Graph Transfer and Apocalypse Learning

The recent GRAPH-BERT model introduces a new approach to learning graph representations merely based on the attention mechanism. GRAPH-BERT provides an opportunity for transferring pre-trained models and learned graph representations across different tasks within the same graph dataset. In this paper, we will further investigate the graph-to-graph transfer of a universal GRAPH-BERT for graph representation learning across different graph datasets, and our proposed model is also referred to as the G5 for simplicity. Many challenges exist in learning G5 to adapt the distinct input and output configurations for each graph data source, as well as the information distributions differences. G5 introduces a pluggable model architecture: (a) each data source will be pre-processed with a unique input representation learning component; (b) each output application task will also have a specific functional component; and (c) all such diverse input and output components will all be conjuncted with a universal GRAPH-BERT core component via an input size unification layer and an output representation fusion layer, respectively. The G5 model removes the last obstacle for cross-graph representation learning and transfer. For the graph sources with very sparse training data, the G5 model pre-trained on other graphs can still be utilized for representation learning with necessary fine-tuning. What's more, the architecture of G5 also allows us to learn a supervised functional classifier for data sources without any training data at all. Such a problem is also named as the Apocalypse Learning task in this paper. Two different label reasoning strategies, i.e., Cross-Source Classification Consistency Maximization (CCCM) and Cross-Source Dynamic Routing (CDR), are introduced in this paper to address the problem.

* Keywords: Graph-Bert; Representation Learning; Apocalypse Learning; Transfer Learning; Graph Mining; Data Mining

Via

Access Paper or Ask Questions