This work proposes a minimal computational model for learning a structured memory of multiple object classes in an incremental setting. Our approach is based on establishing a closed-loop transcription between multiple classes and their corresponding subspaces, known as a linear discriminative representation, in a low-dimensional feature space. Our method is both simpler and more efficient than existing approaches to incremental learning, in terms of model size, storage, and computation: it requires only a single, fixed-capacity autoencoding network with a feature space that is used for both discriminative and generative purposes. All network parameters are optimized simultaneously without architectural manipulations, by solving a constrained minimax game between the encoding and decoding maps over a single rate reduction-based objective. Experimental results show that our method can effectively alleviate catastrophic forgetting, achieving significantly better performance than prior work for both generative and discriminative purposes.
We propose a metric -- Projection Norm -- to predict a model's performance on out-of-distribution (OOD) data without access to ground truth labels. Projection Norm first uses model predictions to pseudo-label test samples and then trains a new model on the pseudo-labels. The more the new model's parameters differ from an in-distribution model, the greater the predicted OOD error. Empirically, our approach outperforms existing methods on both image and text classification tasks and across different network architectures. Theoretically, we connect our approach to a bound on the test error for overparameterized linear models. Furthermore, we find that Projection Norm is the only approach that achieves non-trivial detection performance on adversarial examples. Our code is available at https://github.com/yaodongyu/ProjNorm.
In vehicle-to-infrastructure (V2I) networks, a cluster of multi-antenna access points (APs) can collaboratively conduct transmitter beamforming to provide data services (e.g., eMBB or URLLC). The collaboration between APs effectively forms a networked linear antenna-array with extra-large aperture (i.e., network-ELAA), where the wireless channel exhibits spatial nonstationarity. Major contribution of this work lies in the analysis of beamforming gain and radio coverage for network-ELAA non-stationary Rician channels considering the AP clustering. Assuming that: 1) the total transmit-power is fixed and evenly distributed over APs, 2) the beam is formed only based on the line-of-sight (LoS) path, it is found that the beamforming gain is concave to the cluster size. The optimum size of the AP cluster varies with respect to the user's location, channel uncertainty as well as data services. A user located farther from the ELAA requires a larger cluster size. URLLC is more sensitive to the channel uncertainty when comparing to eMBB, thus requiring a larger cluster size to mitigate the channel fading effect and extend the coverage. Finally, it is shown that the network-ELAA can offer significant coverage extension (50% or more in most of cases) when comparing with the single-AP scenario.
Scalable coding, which can adapt to channel bandwidth variation, performs well in today's complex network environment. However, the existing scalable compression methods face two challenges: reduced compression performance and insufficient scalability. In this paper, we propose the first learned fine-grained scalable image compression model (DeepFGS) to overcome the above two shortcomings. Specifically, we introduce a feature separation backbone to divide the image information into basic and scalable features, then redistribute the features channel by channel through an information rearrangement strategy. In this way, we can generate a continuously scalable bitstream via one-pass encoding. In addition, we reuse the decoder to reduce the parameters and computational complexity of DeepFGS. Experiments demonstrate that our DeepFGS outperforms all learning-based scalable image compression models and conventional scalable image codecs in PSNR and MS-SSIM metrics. To the best of our knowledge, our DeepFGS is the first exploration of learned fine-grained scalable coding, which achieves the finest scalability compared with learning-based methods.
This work proposes a new computational framework for learning an explicit generative model for real-world datasets. In particular we propose to learn {\em a closed-loop transcription} between a multi-class multi-dimensional data distribution and a { linear discriminative representation (LDR)} in the feature space that consists of multiple independent multi-dimensional linear subspaces. In particular, we argue that the optimal encoding and decoding mappings sought can be formulated as the equilibrium point of a {\em two-player minimax game between the encoder and decoder}. A natural utility function for this game is the so-called {\em rate reduction}, a simple information-theoretic measure for distances between mixtures of subspace-like Gaussians in the feature space. Our formulation draws inspiration from closed-loop error feedback from control systems and avoids expensive evaluating and minimizing approximated distances between arbitrary distributions in either the data space or the feature space. To a large extent, this new formulation unifies the concepts and benefits of Auto-Encoding and GAN and naturally extends them to the settings of learning a {\em both discriminative and generative} representation for multi-class and multi-dimensional real-world data. Our extensive experiments on many benchmark imagery datasets demonstrate tremendous potential of this new closed-loop formulation: under fair comparison, visual quality of the learned decoder and classification performance of the encoder is competitive and often better than existing methods based on GAN, VAE, or a combination of both. We notice that the so learned features of different classes are explicitly mapped onto approximately {\em independent principal subspaces} in the feature space; and diverse visual attributes within each class are modeled by the {\em independent principal components} within each subspace.
The technology of using massive transmit-antennas to enable ultra-reliable single-shot transmission (URSST) is challenged by the transmitter-side channel knowledge (i.e., CSIT) imperfection. When the imperfectness mainly comes from the channel time-variation, the outage probability of the matched filter (MF) transmitter beamforming is investigated based on the first-order Markov model of the aged CSIT. With a fixed transmit-power, the transmitter-side uncertainty of the instantaneous signal-to-noise ratio (iSNR) is mathematically characterized. In order to guarantee the outage probability for every single shot, a transmit-power adaptation approach is proposed to satisfy a pessimistic iSNR requirement, which is predicted using the Chernoff lower bound of the beamforming gain. Our numerical results demonstrate a remarkable transmit-power efficiency when comparing with power control approaches using other lower bounds. In addition, a combinatorial approach of the MF beamforming and grouped space-time block code (G-STBC) is proposed to further mitigate the detrimental impact of the CSIT uncertainty. It is shown, through both theoretical analysis and computer simulations, that the combinatorial approach can further improve the transmit-power efficiency with a good tradeoff between the outage probability and the latency.
Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise. However, excessive suppression may lead to speech distortion and speaker information loss, which degrades the performance of speaker embedding extraction. To alleviate this problem, we propose an end-to-end deep learning framework, dubbed PL-EESR, for robust speaker representation extraction. This framework is optimized based on the feedback of the speaker identification task and the high-level perceptual deviation between the raw speech signal and its noisy version. We conducted speaker verification tasks in both noisy and clean environment respectively to evaluate our system. Compared to the baseline, our method shows better performance in both clean and noisy environments, which means our method can not only enhance the speaker relative information but also avoid adding distortions.
In this paper, the Hermite polynomials are employed to study linear approximation models of narrowband multiantenna signal reception (i.e., MIMO) with low-resolution quantizations. This study results in a novel linear approximation using the second-order Hermite expansion (SOHE). The SOHE model is not based on those assumptions often used in existing linear approximations. Instead, the quantization distortion is characterized by the second-order Hermite kernel, and the signal term is characterized by the first-order Hermite kernel. It is shown that the SOHE model can explain almost all phenomena and characteristics observed so far in the low-resolution MIMO signal reception. When the SOHE model is employed to analyze the linear minimum-mean-square-error (LMMSE) channel equalizer, it is revealed that the current LMMSE algorithm can be enhanced by incorporating a symbol-level normalization mechanism. The performance of the enhanced LMMSE algorithm is demonstrated through computer simulations for narrowband MIMO systems in Rayleigh fading channels.
In this paper, a novel spatially non-stationary channel model is proposed for link-level computer simulations of massive multiple-input multiple-output (mMIMO) with extremely large aperture array (ELAA). The proposed channel model allows a mix of non-line-of-sight (NLoS) and LoS links between a user and service antennas. The NLoS/LoS state of each link is characterized by a binary random variable, which obeys a correlated Bernoulli distribution. The correlation is described in the form of an exponentially decaying window. In addition, the proposed model incorporates shadowing effects which are non-identical for NLoS and LoS states. It is demonstrated, through computer emulation, that the proposed model can capture almost all spatially non-stationary fading behaviors of the ELAA-mMIMO channel. Moreover, it has a low implementational complexity. With the proposed channel model, Monte-Carlo simulations are carried out to evaluate the channel capacity of ELAA-mMIMO. It is shown that the ELAA-mMIMO channel capacity has considerably different stochastic characteristics from the conventional mMIMO due to the presence of channel spatial non-stationarity.
Many hierarchical reinforcement learning (RL) applications have empirically verified that incorporating prior knowledge in reward design improves convergence speed and practical performance. We attempt to quantify the computational benefits of hierarchical RL from a planning perspective under assumptions about the intermediate state and intermediate rewards frequently (but often implicitly) adopted in practice. Our approach reveals a trade-off between computational complexity and the pursuit of the shortest path in hierarchical planning: using intermediate rewards significantly reduces the computational complexity in finding a successful policy but does not guarantee to find the shortest path, whereas using sparse terminal rewards finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and other popular deep RL algorithms.