Alert button
Picture for S. -H. Gary Chan

S. -H. Gary Chan

Alert button

A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

Oct 18, 2023
Shuhan Zhong, Sizhe Song, Guanyao Li, Weipeng Zhuo, Yang Liu, S. -H. Gary Chan

Figure 1 for A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis
Figure 2 for A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis
Figure 3 for A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis
Figure 4 for A Multi-Scale Decomposition MLP-Mixer for Time Series Analysis

Time series data, often characterized by unique composition and complex multi-scale temporal variations, requires special consideration of decomposition and multi-scale modeling in its analysis. Existing deep learning methods on this best fit to only univariate time series, and have not sufficiently accounted for sub-series level modeling and decomposition completeness. To address this, we propose MSD-Mixer, a Multi-Scale Decomposition MLP-Mixer which learns to explicitly decompose the input time series into different components, and represents the components in different layers. To handle multi-scale temporal patterns and inter-channel dependencies, we propose a novel temporal patching approach to model the time series as multi-scale sub-series, i.e., patches, and employ MLPs to mix intra- and inter-patch variations and channel-wise correlations. In addition, we propose a loss function to constrain both the magnitude and autocorrelation of the decomposition residual for decomposition completeness. Through extensive experiments on various real-world datasets for five common time series analysis tasks (long- and short-term forecasting, imputation, anomaly detection, and classification), we demonstrate that MSD-Mixer consistently achieves significantly better performance in comparison with other state-of-the-art task-general and task-specific approaches.

Viaarxiv icon

Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation

Jul 23, 2023
Haoyue Bai, Ceyuan Yang, Yinghao Xu, S. -H. Gary Chan, Bolei Zhou

Figure 1 for Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation
Figure 2 for Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation
Figure 3 for Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation
Figure 4 for Improving Out-of-Distribution Robustness of Classifiers via Generative Interpolation

Deep neural networks achieve superior performance for learning from independent and identically distributed (i.i.d.) data. However, their performance deteriorates significantly when handling out-of-distribution (OoD) data, where the training and test are drawn from different distributions. In this paper, we explore utilizing the generative models as a data augmentation source for improving out-of-distribution robustness of neural classifiers. Specifically, we develop a simple yet effective method called Generative Interpolation to fuse generative models trained from multiple domains for synthesizing diverse OoD samples. Training a generative model directly on the source domains tends to suffer from mode collapse and sometimes amplifies the data bias. Instead, we first train a StyleGAN model on one source domain and then fine-tune it on the other domains, resulting in many correlated generators where their model parameters have the same initialization thus are aligned. We then linearly interpolate the model parameters of the generators to spawn new sets of generators. Such interpolated generators are used as an extra data augmentation source to train the classifiers. The interpolation coefficients can flexibly control the augmentation direction and strength. In addition, a style-mixing mechanism is applied to further improve the diversity of the generated OoD samples. Our experiments show that the proposed method explicitly increases the diversity of training domains and achieves consistent improvements over baselines across datasets and multiple different distribution shifts.

Viaarxiv icon

FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals

Jul 12, 2023
Weipeng Zhuo, Ka Ho Chiu, Jierun Chen, Ziqi Zhao, S. -H. Gary Chan, Sangtae Ha, Chul-Ho Lee

Figure 1 for FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals
Figure 2 for FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals
Figure 3 for FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals
Figure 4 for FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals

Floor labels of crowdsourced RF signals are crucial for many smart-city applications, such as multi-floor indoor localization, geofencing, and robot surveillance. To build a prediction model to identify the floor number of a new RF signal upon its measurement, conventional approaches using the crowdsourced RF signals assume that at least few labeled signal samples are available on each floor. In this work, we push the envelope further and demonstrate that it is technically feasible to enable such floor identification with only one floor-labeled signal sample on the bottom floor while having the rest of signal samples unlabeled. We propose FIS-ONE, a novel floor identification system with only one labeled sample. FIS-ONE consists of two steps, namely signal clustering and cluster indexing. We first build a bipartite graph to model the RF signal samples and obtain a latent representation of each node (each signal sample) using our attention-based graph neural network model so that the RF signal samples can be clustered more accurately. Then, we tackle the problem of indexing the clusters with proper floor labels, by leveraging the observation that signals from an access point can be detected on different floors, i.e., signal spillover. Specifically, we formulate a cluster indexing problem as a combinatorial optimization problem and show that it is equivalent to solving a traveling salesman problem, whose (near-)optimal solution can be found efficiently. We have implemented FIS-ONE and validated its effectiveness on the Microsoft dataset and in three large shopping malls. Our results show that FIS-ONE outperforms other baseline algorithms significantly, with up to 23% improvement in adjusted rand index and 25% improvement in normalized mutual information using only one floor-labeled signal sample.

* Accepted by IEEE ICDCS 2023 
Viaarxiv icon

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

Mar 07, 2023
Jierun Chen, Shiu-hong Kao, Hao He, Weipeng Zhuo, Song Wen, Chul-Ho Lee, S. -H. Gary Chan

Figure 1 for Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Figure 2 for Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Figure 3 for Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Figure 4 for Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks

To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does not necessarily lead to a similar level of reduction in latency. This mainly stems from inefficiently low floating-point operations per second (FLOPS). To achieve faster networks, we revisit popular operators and demonstrate that such low FLOPS is mainly due to frequent memory access of the operators, especially the depthwise convolution. We hence propose a novel partial convolution (PConv) that extracts spatial features more efficiently, by cutting down redundant computation and memory access simultaneously. Building upon our PConv, we further propose FasterNet, a new family of neural networks, which attains substantially higher running speed than others on a wide range of devices, without compromising on accuracy for various vision tasks. For example, on ImageNet-1k, our tiny FasterNet-T0 is $3.1\times$, $3.1\times$, and $2.5\times$ faster than MobileViT-XXS on GPU, CPU, and ARM processors, respectively, while being $2.9\%$ more accurate. Our large FasterNet-L achieves impressive $83.5\%$ top-1 accuracy, on par with the emerging Swin-B, while having $49\%$ higher inference throughput on GPU, as well as saving $42\%$ compute time on CPU. Code is available at \url{https://github.com/JierunChen/FasterNet}.

* Accepted to CVPR 2023 
Viaarxiv icon

TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

Mar 22, 2022
Jierun Chen, Tianlang He, Weipeng Zhuo, Li Ma, Sangtae Ha, S. -H. Gary Chan

Figure 1 for TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
Figure 2 for TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
Figure 3 for TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
Figure 4 for TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing

As convolution has empowered many smart applications, dynamic convolution further equips it with the ability to adapt to diverse inputs. However, the static and dynamic convolutions are either layout-agnostic or computation-heavy, making it inappropriate for layout-specific applications, e.g., face recognition and medical image segmentation. We observe that these applications naturally exhibit the characteristics of large intra-image (spatial) variance and small cross-image variance. This observation motivates our efficient translation variant convolution (TVConv) for layout-aware visual processing. Technically, TVConv is composed of affinity maps and a weight-generating block. While affinity maps depict pixel-paired relationships gracefully, the weight-generating block can be explicitly overparameterized for better training while maintaining efficient inference. Although conceptually simple, TVConv significantly improves the efficiency of the convolution and can be readily plugged into various network architectures. Extensive experiments on face recognition show that TVConv reduces the computational cost by up to 3.1x and improves the corresponding throughput by 2.3x while maintaining a high accuracy compared to the depthwise convolution. Moreover, for the same computation cost, we boost the mean accuracy by up to 4.21%. We also conduct experiments on the optic disc/cup segmentation task and obtain better generalization performance, which helps mitigate the critical data scarcity issue. Code is available at https://github.com/JierunChen/TVConv.

* Accepted to CVPR 2022 
Viaarxiv icon

A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting

Jan 04, 2022
Guanyao Li, Shuhan Zhong, Letian Xiang, S. -H. Gary Chan, Ruiyuan Li, Chih-Chieh Hung, Wen-Chih Peng

Figure 1 for A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting
Figure 2 for A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting
Figure 3 for A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting
Figure 4 for A Lightweight and Accurate Spatial-Temporal Transformer for Traffic Forecasting

We study the forecasting problem for traffic with dynamic, possibly periodical, and joint spatial-temporal dependency between regions. Given the aggregated inflow and outflow traffic of regions in a city from time slots 0 to t-1, we predict the traffic at time t at any region. Prior arts in the area often consider the spatial and temporal dependencies in a decoupled manner or are rather computationally intensive in training with a large number of hyper-parameters to tune. We propose ST-TIS, a novel, lightweight, and accurate Spatial-Temporal Transformer with information fusion and region sampling for traffic forecasting. ST-TIS extends the canonical Transformer with information fusion and region sampling. The information fusion module captures the complex spatial-temporal dependency between regions. The region sampling module is to improve the efficiency and prediction accuracy, cutting the computation complexity for dependency learning from $O(n^2)$ to $O(n\sqrt{n})$, where n is the number of regions. With far fewer parameters than state-of-the-art models, the offline training of our model is significantly faster in terms of tuning and computation (with a reduction of up to $90\%$ on training time and network parameters). Notwithstanding such training efficiency, extensive experiments show that ST-TIS is substantially more accurate in online prediction than state-of-the-art approaches (with an average improvement of up to $9.5\%$ on RMSE, and $12.4\%$ on MAPE).

Viaarxiv icon

NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization

Sep 05, 2021
Haoyue Bai, Fengwei Zhou, Lanqing Hong, Nanyang Ye, S. -H. Gary Chan, Zhenguo Li

Figure 1 for NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization
Figure 2 for NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization
Figure 3 for NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization
Figure 4 for NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization

Recent advances on Out-of-Distribution (OoD) generalization reveal the robustness of deep learning models against distribution shifts. However, existing works focus on OoD algorithms, such as invariant risk minimization, domain generalization, or stable learning, without considering the influence of deep model architectures on OoD generalization, which may lead to sub-optimal performance. Neural Architecture Search (NAS) methods search for architecture based on its performance on the training data, which may result in poor generalization for OoD tasks. In this work, we propose robust Neural Architecture Search for OoD generalization (NAS-OoD), which optimizes the architecture with respect to its performance on generated OoD data by gradient descent. Specifically, a data generator is learned to synthesize OoD data by maximizing losses computed by different neural architectures, while the goal for architecture search is to find the optimal architecture parameters that minimize the synthetic OoD data losses. The data generator and the neural architecture are jointly optimized in an end-to-end manner, and the minimax training process effectively discovers robust architectures that generalize well for different distribution shifts. Extensive experimental results show that NAS-OoD achieves superior performance on various OoD generalization benchmarks with deep models having a much fewer number of parameters. In addition, on a real industry dataset, the proposed NAS-OoD method reduces the error rate by more than 70% compared with the state-of-the-art method, demonstrating the proposed method's practicality for real applications.

* Accepted by ICCV2021 
Viaarxiv icon

Crowd Counting by Self-supervised Transfer Colorization Learning and Global Prior Classification

May 20, 2021
Haoyue Bai, Song Wen, S. -H. Gary Chan

Figure 1 for Crowd Counting by Self-supervised Transfer Colorization Learning and Global Prior Classification
Figure 2 for Crowd Counting by Self-supervised Transfer Colorization Learning and Global Prior Classification
Figure 3 for Crowd Counting by Self-supervised Transfer Colorization Learning and Global Prior Classification
Figure 4 for Crowd Counting by Self-supervised Transfer Colorization Learning and Global Prior Classification

Labeled crowd scene images are expensive and scarce. To significantly reduce the requirement of the labeled images, we propose ColorCount, a novel CNN-based approach by combining self-supervised transfer colorization learning and global prior classification to leverage the abundantly available unlabeled data. The self-supervised colorization branch learns the semantics and surface texture of the image by using its color components as pseudo labels. The classification branch extracts global group priors by learning correlations among image clusters. Their fused resultant discriminative features (global priors, semantics and textures) provide ample priors for counting, hence significantly reducing the requirement of labeled images. We conduct extensive experiments on four challenging benchmarks. ColorCount achieves much better performance as compared with other unsupervised approaches. Its performance is close to the supervised baseline with substantially less labeled data (10\% of the original one).

Viaarxiv icon

Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting

Apr 28, 2021
Haoyue Bai, S. -H. Gary Chan

Figure 1 for Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting
Figure 2 for Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting
Figure 3 for Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting
Figure 4 for Motion-guided Non-local Spatial-Temporal Network for Video Crowd Counting

We study video crowd counting, which is to estimate the number of objects (people in this paper) in all the frames of a video sequence. Previous work on crowd counting is mostly on still images. There has been little work on how to properly extract and take advantage of the spatial-temporal correlation between neighboring frames in both short and long ranges to achieve high estimation accuracy for a video sequence. In this work, we propose Monet, a novel and highly accurate motion-guided non-local spatial-temporal network for video crowd counting. Monet first takes people flow (motion information) as guidance to coarsely segment the regions of pixels where a person may be. Given these regions, Monet then uses a non-local spatial-temporal network to extract spatial-temporally both short and long-range contextual information. The whole network is finally trained end-to-end with a fused loss to generate a high-quality density map. Noting the scarcity and low quality (in terms of resolution and scene diversity) of the publicly available video crowd datasets, we have collected and built a large-scale video crowd counting datasets, VidCrowd, to contribute to the community. VidCrowd contains 9,000 frames of high resolution (2560 x 1440), with 1,150,239 head annotations captured in different scenes, crowd density and lighting in two cities. We have conducted extensive experiments on the challenging VideoCrowd and two public video crowd counting datasets: UCSD and Mall. Our approach achieves substantially better performance in terms of MAE and MSE as compared with other state-of-the-art approaches.

Viaarxiv icon