Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Lamb

LAMB: LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz Divergence

Jan 08, 2026

Hyeongkeun Lee, Jongmin Choi, KiHyun Nam, Joon Son Chung

Abstract:Automated Audio Captioning aims to describe the semantic content of input audio. Recent works have employed large language models (LLMs) as a text decoder to leverage their reasoning capabilities. However, prior approaches that project audio features into the LLM embedding space without considering cross-modal alignment fail to fully utilize these capabilities. To address this, we propose LAMB, an LLM-based audio captioning framework that bridges the modality gap between audio embeddings and the LLM text embedding space. LAMB incorporates a Cross-Modal Aligner that minimizes Cauchy-Schwarz divergence while maximizing mutual information, yielding tighter alignment between audio and text at both global and token levels. We further design a Two-Stream Adapter that extracts semantically enriched audio embeddings, thereby delivering richer information to the Cross-Modal Aligner. Finally, leveraging the aligned audio embeddings, a proposed Token Guide directly computes scores within the LLM text embedding space to steer the output logits of generated captions. Experimental results confirm that our framework strengthens the reasoning capabilities of the LLM decoder, achieving state-of-the-art performance on AudioCaps.

* 5 pages, 2 figures;

Via

Access Paper or Ask Questions

Exploring Landscapes for Better Minima along Valleys

Oct 31, 2025

Tong Zhao, Jiacheng Li, Yuanchang Zhou, Guangming Tan, Weile Jia

Abstract:Finding lower and better-generalizing minima is crucial for deep learning. However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the complex geometric properties of the loss landscape, it is difficult to guarantee that such a point is the lowest or provides the best generalization. To address this, we propose an adaptor "E" for gradient-based optimizers. The adapted optimizer tends to continue exploring along landscape valleys (areas with low and nearly identical losses) in order to search for potentially better local minima even after reaching a local minimum. This approach increases the likelihood of finding a lower and flatter local minimum, which is often associated with better generalization. We also provide a proof of convergence for the adapted optimizers in both convex and non-convex scenarios for completeness. Finally, we demonstrate their effectiveness in an important but notoriously difficult training scenario, large-batch training, where Lamb is the benchmark optimizer. Our testing results show that the adapted Lamb, ALTO, increases the test accuracy (generalization) of the current state-of-the-art optimizer by an average of 2.5% across a variety of large-batch training tasks. This work potentially opens a new research direction in the design of optimization algorithms.

* Neurips 2025 poster

Via

Access Paper or Ask Questions

Diagnostic Imaging for Damage Detection in Plates Based on Topological Acoustic (TA) Sensing Technique

Jun 24, 2025

Bo Hu, Tribikram Kundu, Pierre A. Deymier, Keith Runge

Abstract:Traditional structural damage detection methods in aerospace applications face challenges in accuracy and sensitivity, often necessitating multiple sensors to evaluate various measurement paths between the reference and defective states. However, the recently developed topological acoustic (TA) sensing technique can capture shifts in the geometric phase of an acoustic field, enabling the detection of even minor perturbations in the supporting medium. In this study, a diagnostic imaging method for damage detection in plate structures based on the TA sensing technique is presented. The method extracts the geometric phase shift index (GPS-I) from the Lamb wave response signals to indicate the location of the damage. Using Abaqus/CAE, a finite element model of the plate was established to simulate the Lamb wave response signals, which were then used to validate the feasibility of the proposed method. The results indicate that this technique enables rapid and precise identification of damage and its location within the plate structure, requiring response signals from only a few points on the damaged plate, and it is reference-free.

Via

Access Paper or Ask Questions

Localization of Impacts on Thin-Walled Structures by Recurrent Neural Networks: End-to-end Learning from Real-World Data

May 13, 2025

Alexander Humer, Lukas Grasboeck, Ayech Benjeddou

Abstract:Today, machine learning is ubiquitous, and structural health monitoring (SHM) is no exception. Specifically, we address the problem of impact localization on shell-like structures, where knowledge of impact locations aids in assessing structural integrity. Impacts on thin-walled structures excite Lamb waves, which can be measured with piezoelectric sensors. Their dispersive characteristics make it difficult to detect and localize impacts by conventional methods. In the present contribution, we explore the localization of impacts using neural networks. In particular, we propose to use {recurrent neural networks} (RNNs) to estimate impact positions end-to-end, i.e., directly from {sequential sensor data}. We deal with comparatively long sequences of thousands of samples, since high sampling rate are needed to accurately capture elastic waves. For this reason, the proposed approach builds upon Gated Recurrent Units (GRUs), which are less prone to vanishing gradients as compared to conventional RNNs. Quality and quantity of data are crucial when training neural networks. Often, synthetic data is used, which inevitably introduces a reality gap. Here, by contrast, we train our networks using {physical data from experiments}, which requires automation to handle the large number of experiments needed. For this purpose, a {robot is used to drop steel balls} onto an {aluminum plate} equipped with {piezoceramic sensors}. Our results show remarkable accuracy in estimating impact positions, even with a comparatively small dataset.

* XI ECCOMAS Thematic Conference on Smart Structures and Materials (SMART 2025)

Via

Access Paper or Ask Questions

AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

Mar 27, 2025

Tianyu Xu, Yaoyu Cheng, Pinxi Shen, Lin Zhao, Electrical, Computer Engineering, National University of Singapore, Singapore, Mechanical Engineering

Figure 1 for AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

Figure 2 for AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

Figure 3 for AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

Figure 4 for AcL: Action Learner for Fault-Tolerant Quadruped Locomotion Control

Abstract:Quadrupedal robots can learn versatile locomotion skills but remain vulnerable when one or more joints lose power. In contrast, dogs and cats can adopt limping gaits when injured, demonstrating their remarkable ability to adapt to physical conditions. Inspired by such adaptability, this paper presents Action Learner (AcL), a novel teacher-student reinforcement learning framework that enables quadrupeds to autonomously adapt their gait for stable walking under multiple joint faults. Unlike conventional teacher-student approaches that enforce strict imitation, AcL leverages teacher policies to generate style rewards, guiding the student policy without requiring precise replication. We train multiple teacher policies, each corresponding to a different fault condition, and subsequently distill them into a single student policy with an encoder-decoder architecture. While prior works primarily address single-joint faults, AcL enables quadrupeds to walk with up to four faulty joints across one or two legs, autonomously switching between different limping gaits when faults occur. We validate AcL on a real Go2 quadruped robot under single- and double-joint faults, demonstrating fault-tolerant, stable walking, smooth gait transitions between normal and lamb gaits, and robustness against external disturbances.

Via

Access Paper or Ask Questions

Symmetry constrained neural networks for detection and localization of damage in metal plates

Sep 09, 2024

James Amarel, Christopher Rudolf, Athanasios Iliopoulos, John Michopoulos, Leslie N. Smith

Abstract:The present paper is concerned with deep learning techniques applied to detection and localization of damage in a thin aluminum plate. We used data generated on a tabletop apparatus by mounting to the plate four piezoelectric transducers, each of which took turn to generate a Lamb wave that then traversed the region of interest before being received by the remaining three sensors. On training a neural network to analyze time-series data of the material response, which displayed damage-reflective features whenever the plate guided waves interacted with a contact load, we achieved a model that detected with greater than 99% accuracy in addition to a model that localized with $3.14 \pm 0.21$ mm mean distance error and captured more than 60% of test examples within the diffraction limit. For each task, the best-performing model was designed according to the inductive bias that our transducers were both similar and arranged in a square pattern on a nearly uniform plate.

Via

Access Paper or Ask Questions

Multistep Inverse Is Not All You Need

Mar 18, 2024

Alexander Levine, Peter Stone, Amy Zhang

Abstract:In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a multistep-inverse method, in that it uses the encoding of the the first and last state in a path to predict the first action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.

Via

Access Paper or Ask Questions

23.8 GHz Acoustic Filter in Periodically Poled Piezoelectric Film Lithium Niobate with 1.52 dB IL and 19.4% FBW

Feb 19, 2024

Sinwoo Cho, Omar Barrera, Jack Kramer, Vakhtang Chulukhadze, Tzu-Hsuan Hsu, Joshua Campbell, Ian Anderson, Ruochen Lu

Abstract:This paper reports the first piezoelectric acoustic filter in periodically poled piezoelectric film (P3F) lithium niobate (LiNbO3) at 23.8 GHz with low insertion loss (IL) of 1.52 dB and 3-dB fractional bandwidth (FBW) of 19.4%. The filter features a compact footprint of 0.64 mm2. The third-order ladder filter is implemented with electrically coupled resonators in 150 nm bi-layer P3F 128 rotated Y-cut LiNbO3 thin film, operating in second-order symmetric (S2) Lamb mode. The record-breaking performance is enabled by the P3F LiNbO3 platform, where piezoelectric thin films of alternating orientations are transferred subsequently, facilitating efficient higher-order Lamb mode operation with simultaneously high quality factor (Q) and coupling coefficient (k2) at millimeter-wave (mmWave). Also, the multi-layer P3F stack promises smaller footprints and better nonlinearity than single-layer counterparts, thanks to the higher capacitance density and lower thermal resistance. Upon further development, the reported P3F LiNbO3 platform is promising for compact filters at mmWave.

* 4 pages, 7 figures, IEEE Microwave and Wireless Technology Letters

Via

Access Paper or Ask Questions

Breaking MLPerf Training: A Case Study on Optimizing BERT

Feb 04, 2024

Yongdeok Kim, Jaehyung Ahn, Myeongwoo Kim, Changin Choi, Heejae Kim, Narankhuu Tuvshinjargal, Seungwon Lee, Yanzi Zhang, Yuan Pei, Xiongzhan Linghu(+4 more)

Figure 1 for Breaking MLPerf Training: A Case Study on Optimizing BERT

Figure 2 for Breaking MLPerf Training: A Case Study on Optimizing BERT

Figure 3 for Breaking MLPerf Training: A Case Study on Optimizing BERT

Figure 4 for Breaking MLPerf Training: A Case Study on Optimizing BERT

Abstract:Speeding up the large-scale distributed training is challenging in that it requires improving various components of training including load balancing, communication, optimizers, etc. We present novel approaches for fast large-scale training of BERT model which individually ameliorates each component thereby leading to a new level of BERT training performance. Load balancing is imperative in distributed BERT training since its training datasets are characterized by samples with various lengths. Communication cost, which is proportional to the scale of distributed training, needs to be hidden by useful computation. In addition, the optimizers, e.g., ADAM, LAMB, etc., need to be carefully re-evaluated in the context of large-scale distributed training. We propose two new ideas, (1) local presorting based on dataset stratification for load balancing and (2) bucket-wise gradient clipping before allreduce which allows us to benefit from the overlap of gradient computation and synchronization as well as the fast training of gradient clipping before allreduce. We also re-evaluate existing optimizers via hyperparameter optimization and utilize ADAM, which also contributes to fast training via larger batches than existing methods. Our proposed methods, all combined, give the fastest MLPerf BERT training of 25.1 (22.3) seconds on 1,024 NVIDIA A100 GPUs, which is 1.33x (1.13x) and 1.57x faster than the other top two (one) submissions to MLPerf v1.1 (v2.0). Our implementation and evaluation results are available at MLPerf v1.1~v2.1.

* Total 15 pages (Appendix 3 pages)

Via

Access Paper or Ask Questions

Revisiting LARS for Large Batch Training Generalization of Neural Networks

Sep 25, 2023

Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Quoc-Viet Pham

Figure 1 for Revisiting LARS for Large Batch Training Generalization of Neural Networks

Figure 2 for Revisiting LARS for Large Batch Training Generalization of Neural Networks

Figure 3 for Revisiting LARS for Large Batch Training Generalization of Neural Networks

Figure 4 for Revisiting LARS for Large Batch Training Generalization of Neural Networks

Abstract:LARS and LAMB have emerged as prominent techniques in Large Batch Learning (LBL), ensuring the stability of AI training. One of the primary challenges in LBL is convergence stability, where the AI agent usually gets trapped into the sharp minimizer. Addressing this challenge, a relatively recent technique, known as warm-up, has been employed. However, warm-up lacks a strong theoretical foundation, leaving the door open for further exploration of more efficacious algorithms. In light of this situation, we conduct empirical experiments to analyze the behaviors of the two most popular optimizers in the LARS family: LARS and LAMB, with and without a warm-up strategy. Our analyses give us a comprehension of the novel LARS, LAMB, and the necessity of a warm-up technique in LBL. Building upon these insights, we propose a novel algorithm called Time Varying LARS (TVLARS), which facilitates robust training in the initial phase without the need for warm-up. Experimental evaluation demonstrates that TVLARS achieves competitive results with LARS and LAMB when warm-up is utilized while surpassing their performance without the warm-up technique.

Via

Access Paper or Ask Questions

Topic:Lamb

Papers and Code