Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anke Schmeink

Enhancing CTC-Based Visual Speech Recognition

Sep 11, 2024

Hendrik Laux, Anke Schmeink

Abstract:This paper presents LiteVSR2, an enhanced version of our previously introduced efficient approach to Visual Speech Recognition (VSR). Building upon our knowledge distillation framework from a pre-trained Automatic Speech Recognition (ASR) model, we introduce two key improvements: a stabilized video preprocessing technique and feature normalization in the distillation process. These improvements yield substantial performance gains on the LRS2 and LRS3 benchmarks, positioning LiteVSR2 as the current best CTC-based VSR model without increasing the volume of training data or computational resources utilized. Furthermore, we explore the scalability of our approach by examining performance metrics across varying model complexities and training data volumes. LiteVSR2 maintains the efficiency of its predecessor while significantly enhancing accuracy, thereby demonstrating the potential for resource-efficient advancements in VSR technology.

Via

Access Paper or Ask Questions

Parallel Split Learning with Global Sampling

Jul 22, 2024

Mohammad Kohankhaki, Ahmad Ayad, Mahdi Barhoush, Anke Schmeink

Abstract:The expansion of IoT devices and the demands of Deep Learning have highlighted significant challenges in Distributed Deep Learning (DDL) systems. Parallel Split Learning (PSL) has emerged as a promising derivative of Split Learning that is well suited for distributed learning on resource-constrained devices. However, PSL faces several obstacles, such as large effective batch sizes, non-IID data distributions, and the straggler effect. We view these issues as a sampling dilemma and propose to address them by orchestrating the mini-batch sampling process on the server side. We introduce the Uniform Global Sampling (UGS) method to decouple the effective batch size from the number of clients and reduce mini-batch deviation in non-IID settings. To address the straggler effect, we introduce the Latent Dirichlet Sampling (LDS) method, which generalizes UGS to balance the trade-off between batch deviation and training time. Our simulations reveal that our proposed methods enhance model accuracy by up to 34.1% in non-IID settings and reduce the training time in the presence of stragglers by up to 62%. In particular, LDS effectively mitigates the straggler effect without compromising model accuracy or adding significant computational overhead compared to UGS. Our results demonstrate the potential of our methods as a promising solution for DDL in real applications.

Via

Access Paper or Ask Questions

Distributed Combinatorial Optimization of Downlink User Assignment in mmWave Cell-free Massive MIMO Using Graph Neural Networks

Jun 09, 2024

Bile Peng, Bihan Guo, Karl-Ludwig Besser, Luca Kunz, Ramprasad Raghunath, Anke Schmeink, Eduard A Jorswieck, Giuseppe Caire, H. Vincent Poor

Figure 1 for Distributed Combinatorial Optimization of Downlink User Assignment in mmWave Cell-free Massive MIMO Using Graph Neural Networks

Figure 2 for Distributed Combinatorial Optimization of Downlink User Assignment in mmWave Cell-free Massive MIMO Using Graph Neural Networks

Figure 3 for Distributed Combinatorial Optimization of Downlink User Assignment in mmWave Cell-free Massive MIMO Using Graph Neural Networks

Figure 4 for Distributed Combinatorial Optimization of Downlink User Assignment in mmWave Cell-free Massive MIMO Using Graph Neural Networks

Abstract:Millimeter wave (mmWave) cell-free massive MIMO (CF mMIMO) is a promising solution for future wireless communications. However, its optimization is non-trivial due to the challenging channel characteristics. We show that mmWave CF mMIMO optimization is largely an assignment problem between access points (APs) and users due to the high path loss of mmWave channels, the limited output power of the amplifier, and the almost orthogonal channels between users given a large number of AP antennas. The combinatorial nature of the assignment problem, the requirement for scalability, and the distributed implementation of CF mMIMO make this problem difficult. In this work, we propose an unsupervised machine learning (ML) enabled solution. In particular, a graph neural network (GNN) customized for scalability and distributed implementation is introduced. Moreover, the customized GNN architecture is hierarchically permutation-equivariant (HPE), i.e., if the APs or users of an AP are permuted, the output assignment is automatically permuted in the same way. To address the combinatorial problem, we relax it to a continuous problem, and introduce an information entropy-inspired penalty term. The training objective is then formulated using the augmented Lagrangian method (ALM). The test results show that the realized sum-rate outperforms that of the generalized serial dictatorship (GSD) algorithm and is very close to the upper bound in a small network scenario, while the upper bound is impossible to obtain in a large network scenario.

Via

Access Paper or Ask Questions

LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data

Dec 15, 2023

Hendrik Laux, Emil Mededovic, Ahmed Hallawa, Lukas Martin, Arne Peine, Anke Schmeink

Abstract:This paper proposes a novel, resource-efficient approach to Visual Speech Recognition (VSR) leveraging speech representations produced by any trained Automatic Speech Recognition (ASR) model. Moving away from the resource-intensive trends prevalent in recent literature, our method distills knowledge from a trained Conformer-based ASR model, achieving competitive performance on standard VSR benchmarks with significantly less resource utilization. Using unlabeled audio-visual data only, our baseline model achieves a word error rate (WER) of 47.4% and 54.7% on the LRS2 and LRS3 test benchmarks, respectively. After fine-tuning the model with limited labeled data, the word error rate reduces to 35% (LRS2) and 45.7% (LRS3). Our model can be trained on a single consumer-grade GPU within a few days and is capable of performing real-time end-to-end VSR on dated hardware, suggesting a path towards more accessible and resource-efficient VSR methodologies.

* Accepted for publication at ICASSP 2024

Via

Access Paper or Ask Questions

Loss Functions in the Era of Semantic Segmentation: A Survey and Outlook

Dec 08, 2023

Reza Azad, Moein Heidary, Kadir Yilmaz, Michael Hüttemann, Sanaz Karimijafarbigloo, Yuli Wu, Anke Schmeink, Dorit Merhof

Figure 1 for Loss Functions in the Era of Semantic Segmentation: A Survey and Outlook

Figure 2 for Loss Functions in the Era of Semantic Segmentation: A Survey and Outlook

Figure 3 for Loss Functions in the Era of Semantic Segmentation: A Survey and Outlook

Figure 4 for Loss Functions in the Era of Semantic Segmentation: A Survey and Outlook

Abstract:Semantic image segmentation, the process of classifying each pixel in an image into a particular class, plays an important role in many visual understanding systems. As the predominant criterion for evaluating the performance of statistical models, loss functions are crucial for shaping the development of deep learning-based segmentation algorithms and improving their overall performance. To aid researchers in identifying the optimal loss function for their particular application, this survey provides a comprehensive and unified review of $25$ loss functions utilized in image segmentation. We provide a novel taxonomy and thorough review of how these loss functions are customized and leveraged in image segmentation, with a systematic categorization emphasizing their significant features and applications. Furthermore, to evaluate the efficacy of these methods in real-world scenarios, we propose unbiased evaluations of some distinct and renowned loss functions on established medical and natural image datasets. We conclude this review by identifying current challenges and unveiling future research opportunities. Finally, we have compiled the reviewed studies that have open-source implementations on our GitHub page.

Via

Access Paper or Ask Questions

Wireless Powered Short Packet Communications with Multiple WPT Sources

Feb 24, 2023

Ning Guo, Xiaopeng Yuan, Yulin Hu, Anke Schmeink

Abstract:We study a multi-source wireless power transfer (WPT) enabled network supporting multi-sensor transmissions. Activated by energy harvesting (EH) from multiple WPT sources, sensors transmit short packets to a destination with finite blocklength (FBL) codes. This work for the first time characterizes the FBL reliability for such multi-source WPT enabled network and provides reliability-oriented resource allocation designs, while a practical nonlinear EH model is considered. For scenario with a fixed frame structure, we maximize the FBL reliability via optimally allocating the transmit power among multi-source. In particular, we first investigate the relationship between the FBL reliability and multiple WPT source power, based on which a power allocation problem is formulated. To solve the formulated non-convex problem, we introduce auxiliary variables and apply successive convex approximation (SCA) technique to the non-convex component. Consequently, a sub-optimal solution can be obtained. Moreover, we extend our design into a dynamic frame structure scenario, i.e., the blocklength allocated for WPT phase and short-packet transmission phase are adjustable, which introduces more flexibility and new challenges to the system design. We provide a joint power and blocklength allocation design to minimize the system overall error probability under the total power and blocklength constraints. To address the high-dimensional optimization problem, auxiliary variables introduction, multiple variable substitutions and SCA technique utilization are exploited to reformulate and efficiently solve the problem. Finally, through numerical results, we validate our analytical model and evaluate the system performance, where a set of guidelines for practical system design are concluded.

Via

Access Paper or Ask Questions

Robust and Secure Resource Allocation for ISAC Systems: A Novel Optimization Framework for Variable-Length Snapshots

Jun 27, 2022

Dongfang Xu, Xianghao Yu, Derrick Wing Kwan Ng, Anke Schmeink, Robert Schober

Figure 1 for Robust and Secure Resource Allocation for ISAC Systems: A Novel Optimization Framework for Variable-Length Snapshots

Figure 2 for Robust and Secure Resource Allocation for ISAC Systems: A Novel Optimization Framework for Variable-Length Snapshots

Figure 3 for Robust and Secure Resource Allocation for ISAC Systems: A Novel Optimization Framework for Variable-Length Snapshots

Figure 4 for Robust and Secure Resource Allocation for ISAC Systems: A Novel Optimization Framework for Variable-Length Snapshots

Abstract:In this paper, we investigate the robust resource allocation design for secure communication in an integrated sensing and communication (ISAC) system. A multi-antenna dual-functional radar-communication (DFRC) base station (BS) serves multiple single-antenna legitimate users and senses for targets simultaneously, where already identified targets are treated as potential single-antenna eavesdroppers. The DFRC BS scans a sector with a sequence of dedicated beams, and the ISAC system takes a snapshot of the environment during the transmission of each beam. Based on the sensing information, the DFRC BS can acquire the channel state information (CSI) of the potential eavesdroppers. Different from existing works that focused on the resource allocation design for a single snapshot, in this paper, we propose a novel optimization framework that jointly optimizes the communication and sensing resources over a sequence of snapshots with adjustable durations. To this end, we jointly optimize the duration of each snapshot, the beamforming vector, and the covariance matrix of the AN for maximization of the system sum secrecy rate over a sequence of snapshots while guaranteeing a minimum required average achievable rate and a maximum information leakage constraint for each legitimate user. The resource allocation algorithm design is formulated as a non-convex optimization problem, where we account for the imperfect CSI of both the legitimate users and the potential eavesdroppers. To make the problem tractable, we derive a bound for the uncertainty region of the potential eavesdroppers' small-scale fading based on a safe approximation, which facilitates the development of a block coordinate descent-based iterative algorithm for obtaining an efficient suboptimal solution.

* 30 pages, 11 figures

Via

Access Paper or Ask Questions

Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC

Dec 07, 2021

Bin Han, Yao Zhu, Anke Schmeink, Hans D. Schotten

Figure 1 for Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC

Figure 2 for Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC

Figure 3 for Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC

Figure 4 for Time-Energy-Constrained Closed-Loop FBL Communication for Dependable MEC

Abstract:The deployment of multi-access edge computing (MEC) is paving the way towards pervasive intelligence in future 6G networks. This new paradigm also proposes emerging requirements of dependable communications, which goes beyond the ultra-reliable low latency communication (URLLC), focusing on the performance of a closed loop instead of that of an unidirectional link. This work studies the simple but efficient one-shot transmission scheme, investigating the closed-loop-reliability-optimal policy of blocklength allocation under stringent time and energy constraints.

* Accepted for publication at CSCN 2021

Via

Access Paper or Ask Questions

EVO-RL: Evolutionary-Driven Reinforcement Learning

Jul 10, 2020

Ahmed Hallawa, Thorsten Born, Anke Schmeink, Guido Dartmann, Arne Peine, Lukas Martin, Giovanni Iacca, A. E. Eiben, Gerd Ascheid

Figure 1 for EVO-RL: Evolutionary-Driven Reinforcement Learning

Figure 2 for EVO-RL: Evolutionary-Driven Reinforcement Learning

Figure 3 for EVO-RL: Evolutionary-Driven Reinforcement Learning

Figure 4 for EVO-RL: Evolutionary-Driven Reinforcement Learning

Abstract:In this work, we propose a novel approach for reinforcement learning driven by evolutionary computation. Our algorithm, dubbed as Evolutionary-Driven Reinforcement Learning (evo-RL), embeds the reinforcement learning algorithm in an evolutionary cycle, where we distinctly differentiate between purely evolvable (instinctive) behaviour versus purely learnable behaviour. Furthermore, we propose that this distinction is decided by the evolutionary process, thus allowing evo-RL to be adaptive to different environments. In addition, evo-RL facilitates learning on environments with rewardless states, which makes it more suited for real-world problems with incomplete information. To show that evo-RL leads to state-of-the-art performance, we present the performance of different state-of-the-art reinforcement learning algorithms when operating within evo-RL and compare it with the case when these same algorithms are executed independently. Results show that reinforcement learning algorithms embedded within our evo-RL approach significantly outperform the stand-alone versions of the same RL algorithms on OpenAI Gym control problems with rewardless states constrained by the same computational budget.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions