Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bo Hu

DIE

HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment

Jan 08, 2026

Wenzhi Chen, Bo Hu, Leida Li, Lihuo He, Wen Lu, Xinbo Gao

Abstract:With the rapid development of text-to-image generation technology, accurately assessing the alignment between generated images and text prompts has become a critical challenge. Existing methods rely on Euclidean space metrics, neglecting the structured nature of semantic alignment, while lacking adaptive capabilities for different samples. To address these limitations, we propose HyperAlign, an adaptive text-to-image alignment assessment framework based on hyperbolic entailment geometry. First, we extract Euclidean features using CLIP and map them to hyperbolic space. Second, we design a dynamic-supervision entailment modeling mechanism that transforms discrete entailment logic into continuous geometric structure supervision. Finally, we propose an adaptive modulation regressor that utilizes hyperbolic geometric features to generate sample-level modulation parameters, adaptively calibrating Euclidean cosine similarity to predict the final score. HyperAlign achieves highly competitive performance on both single database evaluation and cross-database generalization tasks, fully validating the effectiveness of hyperbolic geometric modeling for image-text alignment assessment.

Via

Access Paper or Ask Questions

Contrastive Entropy Bounds for Density and Conditional Density Decomposition

Nov 17, 2025

Bo Hu, Jose C. Principe

Abstract:This paper studies the interpretability of neural network features from a Bayesian Gaussian view, where optimizing a cost is reaching a probabilistic bound; learning a model approximates a density that makes the bound tight and the cost optimal, often with a Gaussian mixture density. The two examples are Mixture Density Networks (MDNs) using the bound for the marginal and autoencoders using the conditional bound. It is a known result, not only for autoencoders, that minimizing the error between inputs and outputs maximizes the dependence between inputs and the middle. We use Hilbert space and decomposition to address cases where a multiple-output network produces multiple centers defining a Gaussian mixture. Our first finding is that an autoencoder's objective is equivalent to maximizing the trace of a Gaussian operator, the sum of eigenvalues under bases orthonormal w.r.t. the data and model distributions. This suggests that, when a one-to-one correspondence as needed in autoencoders is unnecessary, we can instead maximize the nuclear norm of this operator, the sum of singular values, to maximize overall rank rather than trace. Thus the trace of a Gaussian operator can be used to train autoencoders, and its nuclear norm can be used as divergence to train MDNs. Our second test uses inner products and norms in a Hilbert space to define bounds and costs. Such bounds often have an extra norm compared to KL-based bounds, which increases sample diversity and prevents the trivial solution where a multiple-output network produces the same constant, at the cost of requiring a sample batch to estimate and optimize. We propose an encoder-mixture-decoder architecture whose decoder is multiple-output, producing multiple centers per sample, potentially tightening the bound. Assuming the data are small-variance Gaussian mixtures, this upper bound can be tracked and analyzed quantitatively.

Via

Access Paper or Ask Questions

On Sampling of Multiple Correlated Stochastic Signals

Sep 11, 2025

Lin Jin, Hang Sheng, Hui Feng, Bo Hu

Abstract:Multiple stochastic signals possess inherent statistical correlations, yet conventional sampling methods that process each channel independently result in data redundancy. To leverage this correlation for efficient sampling, we model correlated channels as a linear combination of a smaller set of uncorrelated, wide-sense stationary latent sources. We establish a theoretical lower bound on the total sampling density for zero mean-square error reconstruction, proving it equals the ratio of the joint spectral bandwidth of latent sources to the number of correlated signal channels. We then develop a constructive multi-band sampling scheme that attains this bound. The proposed method operates via spectral partitioning of the latent sources, followed by spatio-temporal sampling and interpolation. Experiments on synthetic and real datasets confirm that our scheme achieves near-lossless reconstruction precisely at the theoretical sampling density, validating its efficiency.

Via

Access Paper or Ask Questions

Diagnostic Imaging for Damage Detection in Plates Based on Topological Acoustic (TA) Sensing Technique

Jun 24, 2025

Bo Hu, Tribikram Kundu, Pierre A. Deymier, Keith Runge

Abstract:Traditional structural damage detection methods in aerospace applications face challenges in accuracy and sensitivity, often necessitating multiple sensors to evaluate various measurement paths between the reference and defective states. However, the recently developed topological acoustic (TA) sensing technique can capture shifts in the geometric phase of an acoustic field, enabling the detection of even minor perturbations in the supporting medium. In this study, a diagnostic imaging method for damage detection in plate structures based on the TA sensing technique is presented. The method extracts the geometric phase shift index (GPS-I) from the Lamb wave response signals to indicate the location of the damage. Using Abaqus/CAE, a finite element model of the plate was established to simulate the Lamb wave response signals, which were then used to validate the feasibility of the proposed method. The results indicate that this technique enables rapid and precise identification of damage and its location within the plate structure, requiring response signals from only a few points on the damaged plate, and it is reference-free.

Via

Access Paper or Ask Questions

MatTools: Benchmarking Large Language Models for Materials Science Tools

May 16, 2025

Siyu Liu, Jiamin Xu, Beilin Ye, Bo Hu, David J. Srolovitz, Tongqi Wen

Figure 1 for MatTools: Benchmarking Large Language Models for Materials Science Tools

Figure 2 for MatTools: Benchmarking Large Language Models for Materials Science Tools

Figure 3 for MatTools: Benchmarking Large Language Models for Materials Science Tools

Figure 4 for MatTools: Benchmarking Large Language Models for Materials Science Tools

Abstract:Large language models (LLMs) are increasingly applied to materials science questions, including literature comprehension, property prediction, materials discovery and alloy design. At the same time, a wide range of physics-based computational approaches have been developed in which materials properties can be calculated. Here, we propose a benchmark application to evaluate the proficiency of LLMs to answer materials science questions through the generation and safe execution of codes based on such physics-based computational materials science packages. MatTools is built on two complementary components: a materials simulation tool question-answer (QA) benchmark and a real-world tool-usage benchmark. We designed an automated methodology to efficiently collect real-world materials science tool-use examples. The QA benchmark, derived from the pymatgen (Python Materials Genomics) codebase and documentation, comprises 69,225 QA pairs that assess the ability of an LLM to understand materials science tools. The real-world benchmark contains 49 tasks (138 subtasks) requiring the generation of functional Python code for materials property calculations. Our evaluation of diverse LLMs yields three key insights: (1)Generalists outshine specialists;(2)AI knows AI; and (3)Simpler is better. MatTools provides a standardized framework for assessing and improving LLM capabilities for materials science tool applications, facilitating the development of more effective AI systems for materials science and general scientific research.

* 27 pages, 23 figures

Via

Access Paper or Ask Questions

MGFF-TDNN: A Multi-Granularity Feature Fusion TDNN Model with Depth-Wise Separable Module for Speaker Verification

May 06, 2025

Ya Li, Bin Zhou, Bo Hu

Abstract:In speaker verification, traditional models often emphasize modeling long-term contextual features to capture global speaker characteristics. However, this approach can neglect fine-grained voiceprint information, which contains highly discriminative features essential for robust speaker embeddings. This paper introduces a novel model architecture, termed MGFF-TDNN, based on multi-granularity feature fusion. The MGFF-TDNN leverages a two-dimensional depth-wise separable convolution module, enhanced with local feature modeling, as a front-end feature extractor to effectively capture time-frequency domain features. To achieve comprehensive multi-granularity feature fusion, we propose the M-TDNN structure, which integrates global contextual modeling with fine-grained feature extraction by combining time-delay neural networks and phoneme-level feature pooling. Experiments on the VoxCeleb dataset demonstrate that the MGFF-TDNN achieves outstanding performance in speaker verification while remaining efficient in terms of parameters and computational resources.

Via

Access Paper or Ask Questions

HistLLM: A Unified Framework for LLM-Based Multimodal Recommendation with User History Encoding and Compression

Apr 14, 2025

Chen Zhang, Bo Hu, Weidong Chen, Zhendong Mao

Abstract:While large language models (LLMs) have proven effective in leveraging textual data for recommendations, their application to multimodal recommendation tasks remains relatively underexplored. Although LLMs can process multimodal information through projection functions that map visual features into their semantic space, recommendation tasks often require representing users' history interactions through lengthy prompts combining text and visual elements, which not only hampers training and inference efficiency but also makes it difficult for the model to accurately capture user preferences from complex and extended prompts, leading to reduced recommendation performance. To address this challenge, we introduce HistLLM, an innovative multimodal recommendation framework that integrates textual and visual features through a User History Encoding Module (UHEM), compressing multimodal user history interactions into a single token representation, effectively facilitating LLMs in processing user preferences. Extensive experiments demonstrate the effectiveness and efficiency of our proposed mechanism.

Via

Access Paper or Ask Questions

Extract, Match, and Score: An Evaluation Paradigm for Long Question-context-answer Triplets in Financial Analysis

Mar 20, 2025

Bo Hu, Han Yuan, Vlad Pandelea, Wuqiong Luo, Yingzhu Zhao, Zheng Ma

Abstract:The rapid advancement of large language models (LLMs) has sparked widespread adoption across diverse applications, making robust evaluation frameworks crucial for assessing their performance. While conventional evaluation metrics remain applicable for shorter texts, their efficacy diminishes when evaluating the quality of long-form answers. This limitation is particularly critical in real-world scenarios involving extended questions, extensive context, and long-form answers, such as financial analysis or regulatory compliance. In this paper, we use a practical financial use case to illustrate applications that handle "long question-context-answer triplets". We construct a real-world financial dataset comprising long triplets and demonstrate the inadequacies of traditional metrics. To address this, we propose an effective Extract, Match, and Score (EMS) evaluation approach tailored to the complexities of long-form LLMs' outputs, providing practitioners with a reliable methodology for assessing LLMs' performance in complex real-world scenarios.

Via

Access Paper or Ask Questions

Efficient UAV Swarm-Based Multi-Task Federated Learning with Dynamic Task Knowledge Sharing

Mar 12, 2025

Yubo Yang, Tao Yang, Xiaofeng Wu, Ziyu Guo, Bo Hu

Abstract:UAV swarms are widely used in emergency communications, area monitoring, and disaster relief. Coordinated by control centers, they are ideal for federated learning (FL) frameworks. However, current UAV-assisted FL methods primarily focus on single tasks, overlooking the need for multi-task training. In disaster relief scenarios, UAVs perform tasks such as crowd detection, road feasibility analysis, and disaster assessment, which exhibit time-varying demands and potential correlations. In order to meet the time-varying requirements of tasks and complete multiple tasks efficiently under resource constraints, in this paper, we propose a UAV swarm based multi-task FL framework, where ground emergency vehicles (EVs) collaborate with UAVs to accomplish multiple tasks efficiently under constrained energy and bandwidth resources. Through theoretical analysis, we identify key factors affecting task performance and introduce a task attention mechanism to dynamically evaluate task importance, thereby achieving efficient resource allocation. Additionally, we propose a task affinity (TA) metric to capture the dynamic correlation among tasks, thereby promoting task knowledge sharing to accelerate training and improve the generalization ability of the model in different scenarios. To optimize resource allocation, we formulate a two-layer optimization problem to jointly optimize UAV transmission power, computation frequency, bandwidth allocation, and UAV-EV associations. For the inner problem, we derive closed-form solutions for transmission power, computation frequency, and bandwidth allocation and apply a block coordinate descent method for optimization. For the outer problem, a two-stage algorithm is designed to determine optimal UAV-EV associations. Furthermore, theoretical analysis reveals a trade-off between UAV energy consumption and multi-task performance.

* Due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract here is shorter than that in the PDF file

Via

Access Paper or Ask Questions

Drift-Aware Federated Learning: A Causal Perspective

Mar 12, 2025

Yunjie Fang, Sheng Wu, Tao Yang, Xiaofeng Wu, Bo Hu

Abstract:Federated learning (FL) facilitates collaborative model training among multiple clients while preserving data privacy, often resulting in enhanced performance compared to models trained by individual clients. However, factors such as communication frequency and data distribution can contribute to feature drift, hindering the attainment of optimal training performance. This paper examine the relationship between model update drift and global as well as local optimizer from causal perspective. The influence of the global optimizer on feature drift primarily arises from the participation frequency of certain clients in server updates, whereas the effect of the local optimizer is typically associated with imbalanced data distributions.To mitigate this drift, we propose a novel framework termed Causal drift-Aware Federated lEarning (CAFE). CAFE exploits the causal relationship between feature-invariant components and classification outcomes to independently calibrate local client sample features and classifiers during the training phase. In the inference phase, it eliminated the drifts in the global model that favor frequently communicating clients.Experimental results demonstrate that CAFE's integration of feature calibration, parameter calibration, and historical information effectively reduces both drift towards majority classes and tendencies toward frequently communicating nodes.

Via

Access Paper or Ask Questions