Channel estimation has long been deemed as one of the most critical problems in three-dimensional (3D) massive multiple-input multiple-output (MIMO), which is recognized as the leading technology that enables 3D spatial signal processing in the fifth-generation (5G) wireless communications and beyond. Recently, by exploring the angular channel model and tensor decompositions, the accuracy of single-user channel estimation for 3D massive MIMO communications has been significantly improved given a limited number of pilot signals. However, these existing approaches cannot be straightforwardly extended to the multi-user channel estimation task, where the base station (BS) aims at acquiring the channels of multiple users at the same time. The difficulty is that the coupling among multiple users' channels makes the channel estimation deviate from widely-used tensor decompositions. It gives a non-standard tensor decomposition format that has not been well tackled. To overcome this challenge, besides directly fitting the new tensor model for channel estimation to the wireless data via block coordinate descent (BCD) method, which is prone to the overfitting of noises or requires regularization parameter tuning, we further propose a novel tuning-free channel estimation algorithm that can automatically control the channel model complexity and thus effectively avoid the overfitting. Numerical results are presented to demonstrate the excellent performance of the proposed algorithm in terms of both estimation accuracy and overfitting avoidance.
With graphs rapidly growing in size and deeper graph neural networks (GNNs) emerging, the training and inference of GNNs become increasingly expensive. Existing network weight pruning algorithms cannot address the main space and computational bottleneck in GNNs, caused by the size and connectivity of the graph. To this end, this paper first presents a unified GNN sparsification (UGS) framework that simultaneously prunes the graph adjacency matrix and the model weights, for effectively accelerating GNN inference on large-scale graphs. Leveraging this new tool, we further generalize the recently popular lottery ticket hypothesis to GNNs for the first time, by defining a graph lottery ticket (GLT) as a pair of core sub-dataset and sparse sub-network, which can be jointly identified from the original GNN and the full dense graph by iteratively applying UGS. Like its counterpart in convolutional neural networks, GLT can be trained in isolation to match the performance of training with the full model and graph, and can be drawn from both randomly initialized and self-supervised pre-trained GNNs. Our proposal has been experimentally verified across various GNN architectures and diverse tasks, on both small-scale graph datasets (Cora, Citeseer and PubMed), and large-scale datasets from the challenging Open Graph Benchmark (OGB). Specifically, for node classification, our found GLTs achieve the same accuracies with 20%~98% MACs saving on small graphs and 25%~85% MACs saving on large ones. For link prediction, GLTs lead to 48%~97% and 70% MACs saving on small and large graph datasets, respectively, without compromising predictive performance. Codes available at https://github.com/VITA-Group/Unified-LTH-GNN.
It is important to calculate and analyze temperature and humidity prediction accuracies among quantitative meteorological forecasting. This study manipulates the extant neural network methods to foster the predictive accuracy. To achieve such tasks, we analyze and explore the predictive accuracy and performance in the neural networks using two combined meteorological factors (temperature and humidity). Simulated studies are performed by applying the artificial neural network (ANN), deep neural network (DNN), extreme learning machine (ELM), long short-term memory (LSTM), and long short-term memory with peephole connections (LSTM-PC) machine learning methods, and the accurate prediction value are compared to that obtained from each other methods. Data are extracted from low frequency time-series of ten metropolitan cities of South Korea from March 2014 to February 2020 to validate our observations. To test the robustness of methods, the error of LSTM is found to outperform that of the other four methods in predictive accuracy. Particularly, as testing results, the temperature prediction of LSTM in summer in Tongyeong has a root mean squared error (RMSE) value of 0.866 lower than that of other neural network methods, while the mean absolute percentage error (MAPE) value of LSTM for humidity prediction is 5.525 in summer in Mokpo, significantly better than other metropolitan cities.
With the continuous development of machine learning technology, major e-commerce platforms have launched recommendation systems based on it to serve a large number of customers with different needs more efficiently. Compared with traditional supervised learning, reinforcement learning can better capture the user's state transition in the decision-making process, and consider a series of user actions, not just the static characteristics of the user at a certain moment. In theory, it will have a long-term perspective, producing a more effective recommendation. The special requirements of reinforcement learning for data make it need to rely on an offline virtual system for training. Our project mainly establishes a virtual user environment for offline training. At the same time, we tried to improve a reinforcement learning algorithm based on bi-clustering to expand the action space and recommended path space of the recommendation agent.
Recognition of anomalous events is a challenging but critical task in many scientific and industrial fields, especially when the properties of anomalies are unknown. In this paper, we present a new anomaly concept called "unicorn" or unique event and present a new, model-independent, unsupervised detection algorithm to detect unicorns. The Temporal Outlier Factor (TOF) is introduced to measure the uniqueness of events in continuous data sets from dynamic systems. The concept of unique events differs significantly from traditional outliers in many aspects: while repetitive outliers are no longer unique events, a unique event is not necessarily outlier in either pointwise or collective sense; it does not necessarily fall out from the distribution of normal activity. The performance of our algorithm was examined in recognizing unique events on different types of simulated data sets with anomalies and it was compared with the standard Local Outlier Factor (LOF). TOF had superior performance compared to LOF even in recognizing traditional outliers and it also recognized unique events that LOF did not. Benefits of the unicorn concept and the new detection method were illustrated by example data sets from very different scientific fields. Our algorithm successfully recognized unique events in those cases where they were already known such as the gravitational waves of a black hole merger on LIGO detector data and the signs of respiratory failure on ECG data series. Furthermore, unique events were found on the LIBOR data set of the last 30 years.
One of the most exciting advancements in AI over the last decade is the wide adoption of ANNs, such as DNN and CNN, in many real-world applications. However, the underlying massive amounts of computation and storage requirement greatly challenge their applicability in resource-limited platforms like the drone, mobile phone, and IoT devices etc. The third generation of neural network model--Spiking Neural Network (SNN), inspired by the working mechanism and efficiency of human brain, has emerged as a promising solution for achieving more impressive computing and power efficiency within light-weighted devices (e.g. single chip). However, the relevant research activities have been narrowly carried out on conventional rate-based spiking system designs for fulfilling the practical cognitive tasks, underestimating SNN's energy efficiency, throughput, and system flexibility. Although the time-based SNN can be more attractive conceptually, its potentials are not unleashed in realistic applications due to lack of efficient coding and practical learning schemes. In this work, a Precise-Time-Dependent Single Spike Neuromorphic Architecture, namely "PT-Spike", is developed to bridge this gap. Three constituent hardware-favorable techniques: precise single-spike temporal encoding, efficient supervised temporal learning, and fast asymmetric decoding are proposed accordingly to boost the energy efficiency and data processing capability of the time-based SNN at a more compact neural network model size when executing real cognitive tasks. Simulation results show that "PT-Spike" demonstrates significant improvements in network size, processing efficiency and power consumption with marginal classification accuracy degradation when compared with the rate-based SNN and ANN under the similar network configuration.
Producing or sharing Child Sexual Exploitation Material (CSEM) is a serious crime fought vigorously by Law Enforcement Agencies (LEAs). When an LEA seizes a computer from a potential producer or consumer of CSEM, they need to analyze the suspect's hard disk's files looking for pieces of evidence. However, a manual inspection of the file content looking for CSEM is a time-consuming task. In most cases, it is unfeasible in the amount of time available for the Spanish police using a search warrant. Instead of analyzing its content, another approach that can be used to speed up the process is to identify CSEM by analyzing the file names and their absolute paths. The main challenge for this task lies behind dealing with short text distorted deliberately by the owners of this material using obfuscated words and user-defined naming patterns. This paper presents and compares two approaches based on short text classification to identify CSEM files. The first one employs two independent supervised classifiers, one for the file name and the other for the path, and their outputs are later on fused into a single score. Conversely, the second approach uses only the file name classifier to iterate over the file's absolute path. Both approaches operate at the character n-grams level, while binary and orthographic features enrich the file name representation, and a binary Logistic Regression model is used for classification. The presented file classifier achieved an average class recall of 0.98. This solution could be integrated into forensic tools and services to support Law Enforcement Agencies to identify CSEM without tackling every file's visual content, which is computationally much more highly demanding.
Optimizing power control in multi-cell cellular networks with deep learning enables such a non-convex problem to be implemented in real-time. When channels are time-varying, the deep neural networks (DNNs) need to be re-trained frequently, which calls for low training complexity. To reduce the number of training samples and the size of DNN required to achieve good performance, a promising approach is to embed the DNNs with priori knowledge. Since cellular networks can be modelled as a graph, it is natural to employ graph neural networks (GNNs) for learning, which exhibit permutation invariance (PI) and equivalence (PE) properties. Unlike the homogeneous GNNs that have been used for wireless problems, whose outputs are invariant or equivalent to arbitrary permutations of vertexes, heterogeneous GNNs (HetGNNs), which are more appropriate to model cellular networks, are only invariant or equivalent to some permutations. If the PI or PE properties of the HetGNN do not match the property of the task to be learned, the performance degrades dramatically. In this paper, we show that the power control policy has a combination of different PI and PE properties, and existing HetGNN does not satisfy these properties. We then design a parameter sharing scheme for HetGNN such that the learned relationship satisfies the desired properties. Simulation results show that the sample complexity and the size of designed GNN for learning the optimal power control policy in multi-user multi-cell networks are much lower than the existing DNNs, when achieving the same sum rate loss from the numerically obtained solutions.
Speaker diarization is a task to label audio or video recordings with classes corresponding to speaker identity, or in short, a task to identify "who spoke when". In the early years, speaker diarization algorithms were developed for speech recognition on multi-speaker audio recordings to enable speaker adaptive processing, but also gained its own value as a stand-alone application over time to provide speaker-specific meta information for downstream tasks such as audio retrieval. More recently, with the rise of deep learning technology that has been a driving force to revolutionary changes in research and practices across speech application domains in the past decade, more rapid advancements have been made for speaker diarization. In this paper, we review not only the historical development of speaker diarization technology but also the recent advancements in neural speaker diarization approaches. We also discuss how speaker diarization systems have been integrated with speech recognition applications and how the recent surge of deep learning is leading the way of jointly modeling these two components to be complementary to each other. By considering such exciting technical trends, we believe that it is a valuable contribution to the community to provide a survey work by consolidating the recent developments with neural methods and thus facilitating further progress towards a more efficient speaker diarization.
This paper introduces a novel real-time algorithm for facial landmark tracking. Compared to detection, tracking has both additional challenges and opportunities. Arguably the most important aspect in this domain is updating a tracker's models as tracking progresses, also known as incremental (face) tracking. While this should result in more accurate localisation, how to do this online and in real time without causing a tracker to drift is still an important open research question. We address this question in the cascaded regression framework, the state-of-the-art approach for facial landmark localisation. Because incremental learning for cascaded regression is costly, we propose a much more efficient yet equally accurate alternative using continuous regression. More specifically, we first propose cascaded continuous regression (CCR) and show its accuracy is equivalent to the Supervised Descent Method. We then derive the incremental learning updates for CCR (iCCR) and show that it is an order of magnitude faster than standard incremental learning for cascaded regression, bringing the time required for the update from seconds down to a fraction of a second, thus enabling real-time tracking. Finally, we evaluate iCCR and show the importance of incremental learning in achieving state-of-the-art performance. Code for our iCCR is available from http://www.cs.nott.ac.uk/~psxes1