Alert button
Picture for Longbing Cao

Longbing Cao

Alert button

Copula Variational LSTM for High-dimensional Cross-market Multivariate Dependence Modeling

May 09, 2023
Jia Xu, Longbing Cao

Figure 1 for Copula Variational LSTM for High-dimensional Cross-market Multivariate Dependence Modeling
Figure 2 for Copula Variational LSTM for High-dimensional Cross-market Multivariate Dependence Modeling
Figure 3 for Copula Variational LSTM for High-dimensional Cross-market Multivariate Dependence Modeling
Figure 4 for Copula Variational LSTM for High-dimensional Cross-market Multivariate Dependence Modeling

We address an important yet challenging problem - modeling high-dimensional dependencies across multivariates such as financial indicators in heterogeneous markets. In reality, a market couples and influences others over time, and the financial variables of a market are also coupled. We make the first attempt to integrate variational sequential neural learning with copula-based dependence modeling to characterize both temporal observable and latent variable-based dependence degrees and structures across non-normal multivariates. Our variational neural network WPVC-VLSTM models variational sequential dependence degrees and structures across multivariate time series by variational long short-term memory networks and regular vine copula. The regular vine copula models nonnormal and long-range distributional couplings across multiple dynamic variables. WPVC-VLSTM is verified in terms of both technical significance and portfolio forecasting performance. It outperforms benchmarks including linear models, stochastic volatility models, deep neural networks, and variational recurrent networks in cross-market portfolio forecasting.

* 15 pages, 7 figures 
Viaarxiv icon

Bayesian Federated Learning: A Survey

Apr 26, 2023
Longbing Cao, Hui Chen, Xuhui Fan, Joao Gama, Yew-Soon Ong, Vipin Kumar

Figure 1 for Bayesian Federated Learning: A Survey
Figure 2 for Bayesian Federated Learning: A Survey
Figure 3 for Bayesian Federated Learning: A Survey

Federated learning (FL) demonstrates its advantages in integrating distributed infrastructure, communication, computing and learning in a privacy-preserving manner. However, the robustness and capabilities of existing FL methods are challenged by limited and dynamic data and conditions, complexities including heterogeneities and uncertainties, and analytical explainability. Bayesian federated learning (BFL) has emerged as a promising approach to address these issues. This survey presents a critical overview of BFL, including its basic concepts, its relations to Bayesian learning in the context of FL, and a taxonomy of BFL from both Bayesian and federated perspectives. We categorize and discuss client- and server-side and FL-based BFL methods and their pros and cons. The limitations of the existing BFL methods and the future directions of BFL research further address the intricate requirements of real-life FL applications.

* Accepted by IJCAI 2023 Survey Track, copyright is owned to IJCAI 
Viaarxiv icon

Learning Informative Representation for Fairness-aware Multivariate Time-series Forecasting: A Group-based Perspective

Jan 27, 2023
Hui He, Qi Zhang, Shoujin Wang, Kun Yi, Zhendong Niu, Longbing Cao

Figure 1 for Learning Informative Representation for Fairness-aware Multivariate Time-series Forecasting: A Group-based Perspective
Figure 2 for Learning Informative Representation for Fairness-aware Multivariate Time-series Forecasting: A Group-based Perspective
Figure 3 for Learning Informative Representation for Fairness-aware Multivariate Time-series Forecasting: A Group-based Perspective
Figure 4 for Learning Informative Representation for Fairness-aware Multivariate Time-series Forecasting: A Group-based Perspective

Multivariate time series (MTS) forecasting has penetrated and benefited our daily life. However, the unfair forecasting of MTSs not only degrades their practical benefit but even brings about serious potential risk. Such unfair MTS forecasting may be attributed to variable disparity leading to advantaged and disadvantaged variables. This issue has rarely been studied in the existing MTS forecasting models. To address this significant gap, we formulate the MTS fairness modeling problem as learning informative representations attending to both advantaged and disadvantaged variables. Accordingly, we propose a novel framework, named FairFor, for fairness-aware MTS forecasting. FairFor is based on adversarial learning to generate both group-irrelevant and -relevant representations for the downstream forecasting. FairFor first adopts the recurrent graph convolution to capture spatio-temporal variable correlations and to group variables by leveraging a spectral relaxation of the K-means objective. Then, it utilizes a novel filtering & fusion module to filter the group-relevant information and generate group-irrelevant representations by orthogonality regularization. The group-irrelevant and -relevant representations form highly informative representations, facilitating to share the knowledge from advantaged variables to disadvantaged variables and guarantee fairness. Extensive experiments on four public datasets demonstrate the FairFor effectiveness for fair forecasting and significant performance improvement.

* 13 pages, 5 figures, submitted to IEEE Transactions on Knowledge and Data Engineering (TKDE) 
Viaarxiv icon

eVAE: Evolutionary Variational Autoencoder

Jan 01, 2023
Zhangkai Wu, Longbing Cao, Lei Qi

Figure 1 for eVAE: Evolutionary Variational Autoencoder
Figure 2 for eVAE: Evolutionary Variational Autoencoder
Figure 3 for eVAE: Evolutionary Variational Autoencoder
Figure 4 for eVAE: Evolutionary Variational Autoencoder

The surrogate loss of variational autoencoders (VAEs) poses various challenges to their training, inducing the imbalance between task fitting and representation inference. To avert this, the existing strategies for VAEs focus on adjusting the tradeoff by introducing hyperparameters, deriving a tighter bound under some mild assumptions, or decomposing the loss components per certain neural settings. VAEs still suffer from uncertain tradeoff learning.We propose a novel evolutionary variational autoencoder (eVAE) building on the variational information bottleneck (VIB) theory and integrative evolutionary neural learning. eVAE integrates a variational genetic algorithm into VAE with variational evolutionary operators including variational mutation, crossover, and evolution. Its inner-outer-joint training mechanism synergistically and dynamically generates and updates the uncertain tradeoff learning in the evidence lower bound (ELBO) without additional constraints. Apart from learning a lossy compression and representation of data under the VIB assumption, eVAE presents an evolutionary paradigm to tune critical factors of VAEs and deep neural networks and addresses the premature convergence and random search problem by integrating evolutionary optimization into deep learning. Experiments show that eVAE addresses the KL-vanishing problem for text generation with low reconstruction loss, generates all disentangled factors with sharp images, and improves the image generation quality,respectively. eVAE achieves better reconstruction loss, disentanglement, and generation-inference balance than its competitors.

Viaarxiv icon

Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing

Nov 05, 2022
Rui Yu, Yifeng Li, Wenpeng Lu, Longbing Cao

Figure 1 for Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing
Figure 2 for Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing
Figure 3 for Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing
Figure 4 for Tri-Attention: Explicit Context-Aware Attention Mechanism for Natural Language Processing

In natural language processing (NLP), the context of a word or sentence plays an essential role. Contextual information such as the semantic representation of a passage or historical dialogue forms an essential part of a conversation and a precise understanding of the present phrase or sentence. However, the standard attention mechanisms typically generate weights using query and key but ignore context, forming a Bi-Attention framework, despite their great success in modeling sequence alignment. This Bi-Attention mechanism does not explicitly model the interactions between the contexts, queries and keys of target sequences, missing important contextual information and resulting in poor attention performance. Accordingly, a novel and general triple-attention (Tri-Attention) framework expands the standard Bi-Attention mechanism and explicitly interacts query, key, and context by incorporating context as the third dimension in calculating relevance scores. Four variants of Tri-Attention are generated by expanding the two-dimensional vector-based additive, dot-product, scaled dot-product, and bilinear operations in Bi-Attention to the tensor operations for Tri-Attention. Extensive experiments on three NLP tasks demonstrate that Tri-Attention outperforms about 30 state-of-the-art non-attention, standard Bi-Attention, contextual Bi-Attention approaches and pretrained neural language models1.

Viaarxiv icon

Supervised Deep Hashing for High-dimensional and Heterogeneous Case-based Reasoning

Jun 29, 2022
Qi Zhang, Liang Hu, Chongyang Shi, Ke Liu, Longbing Cao

Figure 1 for Supervised Deep Hashing for High-dimensional and Heterogeneous Case-based Reasoning
Figure 2 for Supervised Deep Hashing for High-dimensional and Heterogeneous Case-based Reasoning
Figure 3 for Supervised Deep Hashing for High-dimensional and Heterogeneous Case-based Reasoning
Figure 4 for Supervised Deep Hashing for High-dimensional and Heterogeneous Case-based Reasoning

Case-based Reasoning (CBR) on high-dimensional and heterogeneous data is a trending yet challenging and computationally expensive task in the real world. A promising approach is to obtain low-dimensional hash codes representing cases and perform a similarity retrieval of cases in Hamming space. However, previous methods based on data-independent hashing rely on random projections or manual construction, inapplicable to address specific data issues (e.g., high-dimensionality and heterogeneity) due to their insensitivity to data characteristics. To address these issues, this work introduces a novel deep hashing network to learn similarity-preserving compact hash codes for efficient case retrieval and proposes a deep-hashing-enabled CBR model HeCBR. Specifically, we introduce position embedding to represent heterogeneous features and utilize a multilinear interaction layer to obtain case embeddings, which effectively filtrates zero-valued features to tackle high-dimensionality and sparsity and captures inter-feature couplings. Then, we feed the case embeddings into fully-connected layers, and subsequently a hash layer generates hash codes with a quantization regularizer to control the quantization loss during relaxation. To cater to incremental learning of CBR, we further propose an adaptive learning strategy to update the hash function. Extensive experiments on public datasets show that HeCBR greatly reduces storage and significantly accelerates case retrieval. HeCBR achieves desirable performance compared with the state-of-the-art CBR methods and performs significantly better than hashing-based CBR methods in classification.

* 27 pages 
Viaarxiv icon

Label and Distribution-discriminative Dual Representation Learning for Out-of-Distribution Detection

Jun 19, 2022
Zhilin Zhao, Longbing Cao

Figure 1 for Label and Distribution-discriminative Dual Representation Learning for Out-of-Distribution Detection
Figure 2 for Label and Distribution-discriminative Dual Representation Learning for Out-of-Distribution Detection
Figure 3 for Label and Distribution-discriminative Dual Representation Learning for Out-of-Distribution Detection
Figure 4 for Label and Distribution-discriminative Dual Representation Learning for Out-of-Distribution Detection

To classify in-distribution samples, deep neural networks learn label-discriminative representations, which, however, are not necessarily distribution-discriminative according to the information bottleneck. Therefore, trained networks could assign unexpected high-confidence predictions to out-of-distribution samples drawn from distributions differing from that of in-distribution samples. Specifically, networks extract the strongly label-related information from in-distribution samples to learn the label-discriminative representations but discard the weakly label-related information. Accordingly, networks treat out-of-distribution samples with minimum label-sensitive information as in-distribution samples. According to the different informativeness properties of in- and out-of-distribution samples, a Dual Representation Learning (DRL) method learns distribution-discriminative representations that are weakly related to the labeling of in-distribution samples and combines label- and distribution-discriminative representations to detect out-of-distribution samples. For a label-discriminative representation, DRL constructs the complementary distribution-discriminative representation by an implicit constraint, i.e., integrating diverse intermediate representations where an intermediate representation less similar to the label-discriminative representation owns a higher weight. Experiments show that DRL outperforms the state-of-the-art methods for out-of-distribution detection.

Viaarxiv icon

Out-of-distribution Detection by Cross-class Vicinity Distribution of In-distribution Data

Jun 19, 2022
Zhilin Zhao, Longbing Cao, Kun-Yu Lin

Figure 1 for Out-of-distribution Detection by Cross-class Vicinity Distribution of In-distribution Data
Figure 2 for Out-of-distribution Detection by Cross-class Vicinity Distribution of In-distribution Data
Figure 3 for Out-of-distribution Detection by Cross-class Vicinity Distribution of In-distribution Data
Figure 4 for Out-of-distribution Detection by Cross-class Vicinity Distribution of In-distribution Data

Deep neural networks only learn to map in-distribution inputs to their corresponding ground truth labels in the training phase without differentiating out-of-distribution samples from in-distribution ones. This results from the assumption that all samples are independent and identically distributed without distributional distinction. Therefore, a pretrained network learned from the in-distribution samples treats out-of-distribution samples as in-distribution and makes high-confidence predictions on them in the test phase. To address this issue, we draw out-of-distribution samples from the vicinity distribution of training in-distribution samples for learning to reject the prediction on out-of-distribution inputs. A \textit{Cross-class Vicinity Distribution} is introduced by assuming that an out-of-distribution sample generated by mixing multiple in-distribution samples does not share the same classes of its constituents. We thus improve the discriminability of a pretrained network by finetuning it with out-of-distribution samples drawn from the cross-class vicinity distribution, where each out-of-distribution input corresponds to a complementary label. Experiments on various in-/out-of-distribution datasets show that the proposed method significantly outperforms existing methods in improving the capacity of discriminating between in- and out-of-distribution samples.

Viaarxiv icon

Supervision Adaptation Balances In-Distribution Generalization and Out-of-Distribution Detection

Jun 19, 2022
Zhilin Zhao, Longbing Cao, Kun-Yu Lin

Figure 1 for Supervision Adaptation Balances In-Distribution Generalization and Out-of-Distribution Detection
Figure 2 for Supervision Adaptation Balances In-Distribution Generalization and Out-of-Distribution Detection
Figure 3 for Supervision Adaptation Balances In-Distribution Generalization and Out-of-Distribution Detection
Figure 4 for Supervision Adaptation Balances In-Distribution Generalization and Out-of-Distribution Detection

When there is a discrepancy between in-distribution (ID) samples and out-of-distribution (OOD) samples, deep neural networks trained on ID samples suffer from high-confidence prediction on OOD samples. This is primarily caused by unavailable OOD samples to constrain the networks in the training process. To improve the OOD sensitivity of deep networks, several state-of-the-art methods introduce samples from other real-world datasets as OOD samples to the training process and assign manually-determined labels to these OOD samples. However, they sacrifice the classification accuracy because the unreliable labeling of OOD samples would disrupt ID classification. To balance ID generalization and OOD detection, a major challenge to tackle is to make OOD samples compatible with ID ones, which is addressed by our proposed \textit{supervision adaptation} method in this paper to define adaptive supervision information for OOD samples. First, by measuring the dependency between ID samples and their labels through mutual information, we reveal the form of the supervision information in terms of the negative probabilities of all classes. Second, after exploring the data correlations between ID and OOD samples by solving multiple binary regression problems, we estimate the supervision information to make ID classes more separable. We perform experiments on four advanced network architectures with two ID datasets and eleven OOD datasets to demonstrate the balancing effect of our supervision adaptation method in achieving both the ID classification ability and the OOD detection capacity.

Viaarxiv icon

Gray Learning from Non-IID Data with Out-of-distribution Samples

Jun 19, 2022
Zhilin Zhao, Longbing Cao, Chang-Dong Wang

Figure 1 for Gray Learning from Non-IID Data with Out-of-distribution Samples
Figure 2 for Gray Learning from Non-IID Data with Out-of-distribution Samples
Figure 3 for Gray Learning from Non-IID Data with Out-of-distribution Samples
Figure 4 for Gray Learning from Non-IID Data with Out-of-distribution Samples

The quality of the training data annotated by experts cannot be guaranteed, even more so for non-IID data consisting of both in- and out-of-distribution samples (i.e., in-distribution and out-of-distribution samples hold different distributions). Experts may mistakenly annotate out-of-distribution samples the same as in-distribution samples, incurring untrustworthy ground-truth labels. Learning such non-IID data mixing in- and out-of-distribution samples with untrustworthy labels significantly challenges both shallow and deep learning, with no relevant work reported. It would be possible to identify trustworthy complementary labels of a sample indicating which classes it does not belong to, because both in- and out-of-distribution samples do not belong to the classes except those corresponding to the ground-truth label. With this insight, we propose a novel \textit{gray learning} approach to robustly learn from non-IID data with both in- and out-of-distribution samples. Due to the uncertain distributions of training samples, we reject the complementary labels for low-confidence inputs while mapping high-confidence inputs to the ground-truth labels in training. Building on the statistical learning theory, we derive the generalization error which shows that gray learning achieves a tight bound on the non-IID data. Extensive experiments show that our method provides significant improvement over alternative methods from robust statistics.

Viaarxiv icon