This two-part paper investigates the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. In Part I, we introduced AI and ML as well as provided a comprehensive survey on ML enabled channel characterization and antenna-channel optimization, and in this part (Part II) we review state-of-the-art literature on scenario identification and channel modeling here. In particular, the key ideas of ML for scenario identification and channel modeling/prediction are presented, and the widely used ML methods for propagation scenario identification and channel modeling and prediction are analyzed and compared. Based on the state-of-art, the future challenges of AI/ML-based channel data processing techniques are given as well.
To provide higher data rates, as well as better coverage, cost efficiency, security, adaptability, and scalability, the 5G and beyond 5G networks are developed with various artificial intelligence techniques. In this two-part paper, we investigate the application of artificial intelligence (AI) and in particular machine learning (ML) to the study of wireless propagation channels. It firstly provides a comprehensive overview of ML for channel characterization and ML-based antenna-channel optimization in this first part, and then it gives a state-of-the-art literature review of channel scenario identification and channel modeling in Part II. Fundamental results and key concepts of ML for communication networks are presented, and widely used ML methods for channel data processing, propagation channel estimation, and characterization are analyzed and compared. A discussion of challenges and future research directions for ML-enabled next generation networks of the topics covered in this part rounds off the paper.
Vehicle-to-vehicle (V2V) wireless communication systems are fundamental in many intelligent transportation applications, e.g., traffic load control, driverless vehicle, and collision avoidance. Hence, developing appropriate V2V communication systems and standardization require realistic V2V propagation channel models. However, most existing V2V channel modeling studies focus on car-to-car channels; only a few investigate truck-to-car (T2C) or truck-to-truck (T2T) channels. In this paper, a hybrid geometry-based stochastic model (GBSM) is proposed for T2X (T2C or T2T) channels in freeway environments. Next, we parameterize this GBSM from the extensive channel measurements. We extract the multipath components (MPCs) by using a joint maximum likelihood estimation (RiMAX) and then cluster the MPCs based on their evolution patterns.We classify the determined clusters as line-of-sight, multiple-bounce reflections from static interaction objects (IOs), multiple-bounce reflections from mobile IOs, multiple-bounce reflections, and diffuse scattering. Specifically, we model multiple-bounce reflections as double clusters following the COST 273/COST2100 method. This article presents the complete parameterization of the channel model. We validate this model by contrasting the root-mean-square delay spread and the angular spreads of departure/arrival derived from the channel model with the outcomes directly derived from the measurements.
We introduce Attention Free Transformer (AFT), an efficient variant of Transformers that eliminates the need for dot product self attention. In an AFT layer, the key and value are first combined with a set of learned position biases, the result of which is multiplied with the query in an element-wise fashion. This new operation has a memory complexity linear w.r.t. both the context size and the dimension of features, making it compatible to both large input and model sizes. We also introduce AFT-local and AFT-conv, two model variants that take advantage of the idea of locality and spatial weight sharing while maintaining global connectivity. We conduct extensive experiments on two autoregressive modeling tasks (CIFAR10 and Enwik8) as well as an image recognition task (ImageNet-1K classification). We show that AFT demonstrates competitive performance on all the benchmarks, while providing excellent efficiency at the same time.
We study the problem of directly optimizing arbitrary non-differentiable task evaluation metrics such as misclassification rate and recall. Our method, named MetricOpt, operates in a black-box setting where the computational details of the target metric are unknown. We achieve this by learning a differentiable value function, which maps compact task-specific model parameters to metric observations. The learned value function is easily pluggable into existing optimizers like SGD and Adam, and is effective for rapidly finetuning a pre-trained model. This leads to consistent improvements since the value function provides effective metric supervision during finetuning, and helps to correct the potential bias of loss-only supervision. MetricOpt achieves state-of-the-art performance on a variety of metrics for (image) classification, image retrieval and object detection. Solid benefits are found over competing methods, which often involve complex loss design or adaptation. MetricOpt also generalizes well to new tasks and model architectures.
Recent methods for long-tailed instance segmentation still struggle on rare object classes with few training data. We propose a simple yet effective method, Feature Augmentation and Sampling Adaptation (FASA), that addresses the data scarcity issue by augmenting the feature space especially for rare classes. Both the Feature Augmentation (FA) and feature sampling components are adaptive to the actual training status -- FA is informed by the feature mean and variance of observed real samples from past iterations, and we sample the generated virtual features in a loss-adapted manner to avoid over-fitting. FASA does not require any elaborate loss design, and removes the need for inter-class transfer learning that often involves large cost and manually-defined head/tail class groups. We show FASA is a fast, generic method that can be easily plugged into standard or long-tailed segmentation frameworks, with consistent performance gains and little added cost. FASA is also applicable to other tasks like long-tailed classification with state-of-the-art performance. Code will be released.
In most machine learning training paradigms a fixed, often handcrafted, loss function is assumed to be a good proxy for an underlying evaluation metric. In this work we assess this assumption by meta-learning an adaptive loss function to directly optimize the evaluation metric. We propose a sample efficient reinforcement learning approach for adapting the loss dynamically during training. We empirically show how this formulation improves performance by simultaneously optimizing the evaluation metric and smoothing the loss landscape. We verify our method in metric learning and classification scenarios, showing considerable improvements over the state-of-the-art on a diverse set of tasks. Importantly, our method is applicable to a wide range of loss functions and evaluation metrics. Furthermore, the learned policies are transferable across tasks and data, demonstrating the versatility of the method.
We present a novel approach for the task of human pose transfer, which aims at synthesizing a new image of a person from an input image of that person and a target pose. We address the issues of limited correspondences identified between keypoints only and invisible pixels due to self-occlusion. Unlike existing methods, we propose to estimate dense and intrinsic 3D appearance flow to better guide the transfer of pixels between poses. In particular, we wish to generate the 3D flow from just the reference and target poses. Training a network for this purpose is non-trivial, especially when the annotations for 3D appearance flow are scarce by nature. We address this problem through a flow synthesis stage. This is achieved by fitting a 3D model to the given pose pair and project them back to the 2D plane to compute the dense appearance flow for training. The synthesized ground-truths are then used to train a feedforward network for efficient mapping from the input and target skeleton poses to the 3D appearance flow. With the appearance flow, we perform feature warping on the input image and generate a photorealistic image of the target pose. Extensive results on DeepFashion and Market-1501 datasets demonstrate the effectiveness of our approach over existing methods. Our code is available at http://mmlab.ie.cuhk.edu.hk/projects/pose-transfer
Due to the emergence of Generative Adversarial Networks, video synthesis has witnessed exceptional breakthroughs. However, existing methods lack a proper representation to explicitly control the dynamics in videos. Human pose, on the other hand, can represent motion patterns intrinsically and interpretably, and impose the geometric constraints regardless of appearance. In this paper, we propose a pose guided method to synthesize human videos in a disentangled way: plausible motion prediction and coherent appearance generation. In the first stage, a Pose Sequence Generative Adversarial Network (PSGAN) learns in an adversarial manner to yield pose sequences conditioned on the class label. In the second stage, a Semantic Consistent Generative Adversarial Network (SCGAN) generates video frames from the poses while preserving coherent appearances in the input image. By enforcing semantic consistency between the generated and ground-truth poses at a high feature level, our SCGAN is robust to noisy or abnormal poses. Extensive experiments on both human action and human face datasets manifest the superiority of the proposed method over other state-of-the-arts.
Data for face analysis often exhibit highly-skewed class distribution, i.e., most data belong to a few majority classes, while the minority classes only contain a scarce amount of instances. To mitigate this issue, contemporary deep learning methods typically follow classic strategies such as class re-sampling or cost-sensitive training. In this paper, we conduct extensive and systematic experiments to validate the effectiveness of these classic schemes for representation learning on class-imbalanced data. We further demonstrate that more discriminative deep representation can be learned by enforcing a deep network to maintain inter-cluster margins both within and between classes. This tight constraint effectively reduces the class imbalance inherent in the local data neighborhood, thus carving much more balanced class boundaries locally. We show that it is easy to deploy angular margins between the cluster distributions on a hypersphere manifold. Such learned Cluster-based Large Margin Local Embedding (CLMLE), when combined with a simple k-nearest cluster algorithm, shows significant improvements in accuracy over existing methods on both face recognition and face attribute prediction tasks that exhibit imbalanced class distribution.