Henry




Abstract:In this paper, we propose a radio-based passive target tracking algorithm using multipath measurements, including the angle of arrival and relative distance. We focus on a scenario in which a mobile receiver continuously receives radio signals from a transmitter located at an unknown position. The receiver utilizes multipath measurements extracted from the received signal to jointly localize the transmitter and the scatterers over time, with scatterers comprising a moving target and stationary objects that can reflect signals within the environment. We develop a comprehensive probabilistic model for the target tracking problem, incorporating the localization of the transmitter and scatterers, the identification of false alarms and missed detections in the measurements, and the association between scatterers and measurements. We employ a belief propagation approach to compute the posterior distributions of the positions of the scatterers and the transmitter. Additionally, we introduce a particle implementation for the belief propagation method. Simulation results demonstrate that our proposed algorithm outperforms existing benchmark methods in terms of target tracking accuracy.




Abstract:The passive and frequency-flat reflection of IRS, as well as the high-dimensional IRS-reflected channels, have posed significant challenges for efficient IRS channel estimation, especially in wideband communication systems with significant multi-path channel delay spread. To address these challenges, we propose a novel neural network (NN)-empowered framework for IRS channel autocorrelation matrix estimation in wideband orthogonal frequency division multiplexing (OFDM) systems. This framework relies only on the easily accessible reference signal received power (RSRP) measurements at users in existing wideband communication systems, without requiring additional pilot transmission. Based on the estimates of channel autocorrelation matrix, the passive reflection of IRS is optimized to maximize the average user received signal-to-noise ratio (SNR) over all subcarriers in the OFDM system. Numerical results verify that the proposed algorithm significantly outperforms existing powermeasurement-based IRS reflection designs in wideband channels.




Abstract:Recently, the success of Text-to-Image (T2I) models has led to the rise of numerous third-party platforms, which claim to provide cheaper API services and more flexibility in model options. However, this also raises a new security concern: Are these third-party services truly offering the models they claim? To address this problem, we propose the first T2I model verification method named Text-to-Image Model Verification via Non-Transferable Adversarial Attacks (TVN). The non-transferability of adversarial examples means that these examples are only effective on a target model and ineffective on other models, thereby allowing for the verification of the target model. TVN utilizes the Non-dominated Sorting Genetic Algorithm II (NSGA-II) to optimize the cosine similarity of a prompt's text encoding, generating non-transferable adversarial prompts. By calculating the CLIP-text scores between the non-transferable adversarial prompts without perturbations and the images, we can verify if the model matches the claimed target model, based on a 3-sigma threshold. The experiments showed that TVN performed well in both closed-set and open-set scenarios, achieving a verification accuracy of over 90\%. Moreover, the adversarial prompts generated by TVN significantly reduced the CLIP-text scores of the target model, while having little effect on other models.




Abstract:We present the results of the "Fast Calorimeter Simulation Challenge 2022" - the CaloChallenge. We study state-of-the-art generative models on four calorimeter shower datasets of increasing dimensionality, ranging from a few hundred voxels to a few tens of thousand voxels. The 31 individual submissions span a wide range of current popular generative architectures, including Variational AutoEncoders (VAEs), Generative Adversarial Networks (GANs), Normalizing Flows, Diffusion models, and models based on Conditional Flow Matching. We compare all submissions in terms of quality of generated calorimeter showers, as well as shower generation time and model size. To assess the quality we use a broad range of different metrics including differences in 1-dimensional histograms of observables, KPD/FPD scores, AUCs of binary classifiers, and the log-posterior of a multiclass classifier. The results of the CaloChallenge provide the most complete and comprehensive survey of cutting-edge approaches to calorimeter fast simulation to date. In addition, our work provides a uniquely detailed perspective on the important problem of how to evaluate generative models. As such, the results presented here should be applicable for other domains that use generative AI and require fast and faithful generation of samples in a large phase space.
Abstract:In this paper, we study the intelligent reflecting surface (IRS) deployment problem where a number of IRSs are optimally placed in a target area to improve its signal coverage with the serving base station (BS). To achieve this, we assume that there is a given set of candidate sites in the target area for deploying IRSs and divide the area into multiple grids of identical size. Then, we derive the average channel power gains from the BS to IRS in each candidate site and from this IRS to any grid in the target area in terms of IRS deployment parameters, including its size, position, height, and orientation. Thus, we are able to approximate the average cascaded channel power gain from the BS to each grid via any IRS, assuming an effective IRS reflection gain based on the large-scale channel knowledge only. Next, we formulate a multi-IRS deployment optimization problem to minimize the total deployment cost by selecting a subset of candidate sites for deploying IRSs and jointly optimizing their heights, orientations, and numbers of reflecting elements while satisfying a given coverage rate performance requirement over all grids in the target area. To solve this challenging combinatorial optimization problem, we first reformulate it as an integer linear programming problem and solve it optimally using the branch-and-bound (BB) algorithm. In addition, we propose an efficient successive refinement algorithm to further reduce computational complexity. Simulation results demonstrate that the proposed lower-complexity successive refinement algorithm achieves near-optimal performance but with significantly reduced running time compared to the proposed optimal BB algorithm, as well as superior performance-cost trade-off than other baseline IRS deployment strategies.




Abstract:Survival analysis is an important research topic with applications in healthcare, business, and manufacturing. One essential tool in this area is the Cox proportional hazards (CPH) model, which is widely used for its interpretability, flexibility, and predictive performance. However, for modern data science challenges such as high dimensionality (both $n$ and $p$) and high feature correlations, current algorithms to train the CPH model have drawbacks, preventing us from using the CPH model at its full potential. The root cause is that the current algorithms, based on the Newton method, have trouble converging due to vanishing second order derivatives when outside the local region of the minimizer. To circumvent this problem, we propose new optimization methods by constructing and minimizing surrogate functions that exploit hidden mathematical structures of the CPH model. Our new methods are easy to implement and ensure monotonic loss decrease and global convergence. Empirically, we verify the computational efficiency of our methods. As a direct application, we show how our optimization methods can be used to solve the cardinality-constrained CPH problem, producing very sparse high-quality models that were not previously practical to construct. We list several extensions that our breakthrough enables, including optimization opportunities, theoretical questions on CPH's mathematical structure, as well as other CPH-related applications.
Abstract:The codebook-based analog beamforming is appealing for future terahertz (THz) communications since it can generate high-gain directional beams with low-cost phase shifters via low-complexity beam training. However, conventional beamforming codebook design based on array response vectors for narrowband communications may suffer from severe performance loss in wideband systems due to the ``beam squint" effect over frequency. To tackle this issue, we propose in this paper a new codebook design method for analog beamforming in wideband THz systems. In particular, to characterize the analog beamforming performance in wideband systems, we propose a new metric termed wideband beam gain, which is given by the minimum beamforming gain over the entire frequency band given a target angle. Based on this metric, a wideband analog beamforming codebook design problem is formulated for optimally balancing the beamforming gains in both the spatial and frequency domains, and the performance loss of conventional narrowband beamforming in wideband systems is analyzed. To solve the new wideband beamforming codebook design problem, we divide the spatial domain into orthogonal angular zones each associated with one beam, thereby decoupling the codebook design into a zone division sub-problem and a set of beamforming optimization sub-problems each for one zone. For the zone division sub-problem, we propose a bisection method to obtain the optimal boundaries for separating adjacent zones. While for each of the per-zone-based beamforming optimization sub-problems, we further propose an efficient augmented Lagrange method (ALM) to solve it. Numerical results demonstrate the performance superiority of our proposed codebook design for wideband analog beamforming to the narrowband beamforming codebook and also validate our performance analysis.




Abstract:Task-oriented grasping, which involves grasping specific parts of objects based on their functions, is crucial for developing advanced robotic systems capable of performing complex tasks in dynamic environments. In this paper, we propose a training-free framework that incorporates both semantic and geometric priors for zero-shot task-oriented grasp generation. The proposed framework, SegGrasp, first leverages the vision-language models like GLIP for coarse segmentation. It then uses detailed geometric information from convex decomposition to improve segmentation quality through a fusion policy named GeoFusion. An effective grasp pose can be generated by a grasping network with improved segmentation. We conducted the experiments on both segmentation benchmark and real-world robot grasping. The experimental results show that SegGrasp surpasses the baseline by more than 15\% in grasp and segmentation performance.




Abstract:Domain adaptive object detection (DAOD) aims to generalize detectors trained on an annotated source domain to an unlabelled target domain. As the visual-language models (VLMs) can provide essential general knowledge on unseen images, freezing the visual encoder and inserting a domain-agnostic adapter can learn domain-invariant knowledge for DAOD. However, the domain-agnostic adapter is inevitably biased to the source domain. It discards some beneficial knowledge discriminative on the unlabelled domain, i.e., domain-specific knowledge of the target domain. To solve the issue, we propose a novel Domain-Aware Adapter (DA-Ada) tailored for the DAOD task. The key point is exploiting domain-specific knowledge between the essential general knowledge and domain-invariant knowledge. DA-Ada consists of the Domain-Invariant Adapter (DIA) for learning domain-invariant knowledge and the Domain-Specific Adapter (DSA) for injecting the domain-specific knowledge from the information discarded by the visual encoder. Comprehensive experiments over multiple DAOD tasks show that DA-Ada can efficiently infer a domain-aware visual encoder for boosting domain adaptive object detection. Our code is available at https://github.com/Therock90421/DA-Ada.
Abstract:The expanding context windows in large language models (LLMs) have greatly enhanced their capabilities in various applications, but they also introduce significant challenges in maintaining low latency, particularly in Time to First Token (TTFT). This paper identifies that the sharp rise in TTFT as context length increases is predominantly driven by queuing delays, which are caused by the growing demands for GPU Key-Value (KV) cache allocation clashing with the limited availability of KV cache blocks. To address this issue, we propose LayerKV, a simple yet effective plug-in method that effectively reduces TTFT without requiring additional hardware or compromising output performance, while seamlessly integrating with existing parallelism strategies and scheduling techniques. Specifically, LayerKV introduces layer-wise KV block allocation, management, and offloading for fine-grained control over system memory, coupled with an SLO-aware scheduler to optimize overall Service Level Objectives (SLOs). Comprehensive evaluations on representative models, ranging from 7B to 70B parameters, across various GPU configurations, demonstrate that LayerKV improves TTFT latency up to 11x and reduces SLO violation rates by 28.7\%, significantly enhancing the user experience