Abstract:This paper presents LiteVSR2, an enhanced version of our previously introduced efficient approach to Visual Speech Recognition (VSR). Building upon our knowledge distillation framework from a pre-trained Automatic Speech Recognition (ASR) model, we introduce two key improvements: a stabilized video preprocessing technique and feature normalization in the distillation process. These improvements yield substantial performance gains on the LRS2 and LRS3 benchmarks, positioning LiteVSR2 as the current best CTC-based VSR model without increasing the volume of training data or computational resources utilized. Furthermore, we explore the scalability of our approach by examining performance metrics across varying model complexities and training data volumes. LiteVSR2 maintains the efficiency of its predecessor while significantly enhancing accuracy, thereby demonstrating the potential for resource-efficient advancements in VSR technology.
Abstract:The expansion of IoT devices and the demands of Deep Learning have highlighted significant challenges in Distributed Deep Learning (DDL) systems. Parallel Split Learning (PSL) has emerged as a promising derivative of Split Learning that is well suited for distributed learning on resource-constrained devices. However, PSL faces several obstacles, such as large effective batch sizes, non-IID data distributions, and the straggler effect. We view these issues as a sampling dilemma and propose to address them by orchestrating the mini-batch sampling process on the server side. We introduce the Uniform Global Sampling (UGS) method to decouple the effective batch size from the number of clients and reduce mini-batch deviation in non-IID settings. To address the straggler effect, we introduce the Latent Dirichlet Sampling (LDS) method, which generalizes UGS to balance the trade-off between batch deviation and training time. Our simulations reveal that our proposed methods enhance model accuracy by up to 34.1% in non-IID settings and reduce the training time in the presence of stragglers by up to 62%. In particular, LDS effectively mitigates the straggler effect without compromising model accuracy or adding significant computational overhead compared to UGS. Our results demonstrate the potential of our methods as a promising solution for DDL in real applications.
Abstract:Millimeter wave (mmWave) cell-free massive MIMO (CF mMIMO) is a promising solution for future wireless communications. However, its optimization is non-trivial due to the challenging channel characteristics. We show that mmWave CF mMIMO optimization is largely an assignment problem between access points (APs) and users due to the high path loss of mmWave channels, the limited output power of the amplifier, and the almost orthogonal channels between users given a large number of AP antennas. The combinatorial nature of the assignment problem, the requirement for scalability, and the distributed implementation of CF mMIMO make this problem difficult. In this work, we propose an unsupervised machine learning (ML) enabled solution. In particular, a graph neural network (GNN) customized for scalability and distributed implementation is introduced. Moreover, the customized GNN architecture is hierarchically permutation-equivariant (HPE), i.e., if the APs or users of an AP are permuted, the output assignment is automatically permuted in the same way. To address the combinatorial problem, we relax it to a continuous problem, and introduce an information entropy-inspired penalty term. The training objective is then formulated using the augmented Lagrangian method (ALM). The test results show that the realized sum-rate outperforms that of the generalized serial dictatorship (GSD) algorithm and is very close to the upper bound in a small network scenario, while the upper bound is impossible to obtain in a large network scenario.
Abstract:This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model. Moving away from the resource-intensive trends prevalent in recent literature, our method distills knowledge from a trained Conformer-based ASR model, achieving competitive performance on standard VSR benchmarks with significantly less resource utilization. Using unlabeled audio-visual data only, our baseline model achieves a word error rate (WER) of 47.4% and 54.7% on the LRS2 and LRS3 test benchmarks, respectively. After fine-tuning the model with limited labeled data, the word error rate reduces to 35% (LRS2) and 45.7% (LRS3). Our model can be trained on a single consumer-grade GPU within a few days and is capable of performing real-time end-to-end VSR on dated hardware, suggesting a path towards more accessible and resource-efficient VSR methodologies.
Abstract:Semantic image segmentation, the process of classifying each pixel in an image into a particular class, plays an important role in many visual understanding systems. As the predominant criterion for evaluating the performance of statistical models, loss functions are crucial for shaping the development of deep learning-based segmentation algorithms and improving their overall performance. To aid researchers in identifying the optimal loss function for their particular application, this survey provides a comprehensive and unified review of $25$ loss functions utilized in image segmentation. We provide a novel taxonomy and thorough review of how these loss functions are customized and leveraged in image segmentation, with a systematic categorization emphasizing their significant features and applications. Furthermore, to evaluate the efficacy of these methods in real-world scenarios, we propose unbiased evaluations of some distinct and renowned loss functions on established medical and natural image datasets. We conclude this review by identifying current challenges and unveiling future research opportunities. Finally, we have compiled the reviewed studies that have open-source implementations on our GitHub page.
Abstract:We study a multi-source wireless power transfer (WPT) enabled network supporting multi-sensor transmissions. Activated by energy harvesting (EH) from multiple WPT sources, sensors transmit short packets to a destination with finite blocklength (FBL) codes. This work for the first time characterizes the FBL reliability for such multi-source WPT enabled network and provides reliability-oriented resource allocation designs, while a practical nonlinear EH model is considered. For scenario with a fixed frame structure, we maximize the FBL reliability via optimally allocating the transmit power among multi-source. In particular, we first investigate the relationship between the FBL reliability and multiple WPT source power, based on which a power allocation problem is formulated. To solve the formulated non-convex problem, we introduce auxiliary variables and apply successive convex approximation (SCA) technique to the non-convex component. Consequently, a sub-optimal solution can be obtained. Moreover, we extend our design into a dynamic frame structure scenario, i.e., the blocklength allocated for WPT phase and short-packet transmission phase are adjustable, which introduces more flexibility and new challenges to the system design. We provide a joint power and blocklength allocation design to minimize the system overall error probability under the total power and blocklength constraints. To address the high-dimensional optimization problem, auxiliary variables introduction, multiple variable substitutions and SCA technique utilization are exploited to reformulate and efficiently solve the problem. Finally, through numerical results, we validate our analytical model and evaluate the system performance, where a set of guidelines for practical system design are concluded.
Abstract:In this paper, we investigate the robust resource allocation design for secure communication in an integrated sensing and communication (ISAC) system. A multi-antenna dual-functional radar-communication (DFRC) base station (BS) serves multiple single-antenna legitimate users and senses for targets simultaneously, where already identified targets are treated as potential single-antenna eavesdroppers. The DFRC BS scans a sector with a sequence of dedicated beams, and the ISAC system takes a snapshot of the environment during the transmission of each beam. Based on the sensing information, the DFRC BS can acquire the channel state information (CSI) of the potential eavesdroppers. Different from existing works that focused on the resource allocation design for a single snapshot, in this paper, we propose a novel optimization framework that jointly optimizes the communication and sensing resources over a sequence of snapshots with adjustable durations. To this end, we jointly optimize the duration of each snapshot, the beamforming vector, and the covariance matrix of the AN for maximization of the system sum secrecy rate over a sequence of snapshots while guaranteeing a minimum required average achievable rate and a maximum information leakage constraint for each legitimate user. The resource allocation algorithm design is formulated as a non-convex optimization problem, where we account for the imperfect CSI of both the legitimate users and the potential eavesdroppers. To make the problem tractable, we derive a bound for the uncertainty region of the potential eavesdroppers' small-scale fading based on a safe approximation, which facilitates the development of a block coordinate descent-based iterative algorithm for obtaining an efficient suboptimal solution.
Abstract:The deployment of multi-access edge computing (MEC) is paving the way towards pervasive intelligence in future 6G networks. This new paradigm also proposes emerging requirements of dependable communications, which goes beyond the ultra-reliable low latency communication (URLLC), focusing on the performance of a closed loop instead of that of an unidirectional link. This work studies the simple but efficient one-shot transmission scheme, investigating the closed-loop-reliability-optimal policy of blocklength allocation under stringent time and energy constraints.
Abstract:In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as Evolutionary-Driven Reinforcement Learning (evo-RL), embeds the reinforcement learning algorithm in an evolutionary cycle, where we distinctly differentiate between purely evolvable (instinctive) behaviour versus purely learnable behaviour. Furthermore, we propose that this distinction is decided by the evolutionary process, thus allowing evo-RL to be adaptive to different environments. In addition, evo-RL facilitates learning on environments with rewardless states, which makes it more suited for real-world problems with incomplete information. To show that evo-RL leads to state-of-the-art performance, we present the performance of different state-of-the-art reinforcement learning algorithms when operating within evo-RL and compare it with the case when these same algorithms are executed independently. Results show that reinforcement learning algorithms embedded within our evo-RL approach significantly outperform the stand-alone versions of the same RL algorithms on OpenAI Gym control problems with rewardless states constrained by the same computational budget.