National University of Defense Technology
Abstract:This paper studies uncertainty quantification for large language models (LLMs) under black-box access, where only a small number of responses can be sampled for each query. In this setting, estimating the effective semantic alphabet size--that is, the number of distinct meanings expressed in the sampled responses--provides a useful proxy for downstream risk. However, frequency-based estimators tend to undercount rare semantic modes when the sample size is small, while graph-spectral quantities alone are not designed to estimate semantic occupancy accurately. To address this issue, we propose SHADE (Soft-Hybrid Alphabet Dynamic Estimator), a simple and interpretable estimator that combines Generalized Good-Turing coverage with a heat-kernel trace of the normalized Laplacian constructed from an entailment-weighted graph over sampled responses. The estimated coverage adaptively determines the fusion rule: under high coverage, SHADE uses a convex combination of the two signals, while under low coverage it applies a LogSumExp fusion to emphasize missing or weakly observed semantic modes. A finite-sample correction is then introduced to stabilize the resulting cardinality estimate before converting it into a coverage-adjusted semantic entropy score. Experiments on pooled semantic alphabet-size estimation against large-sample references and on QA incorrectness detection show that SHADE achieves the strongest improvements in the most sample-limited regime, while the performance gap narrows as the number of samples increases. These results suggest that hybrid semantic occupancy estimation is particularly beneficial when black-box uncertainty quantification must operate under tight sampling budgets.




Abstract:LiDAR-based place recognition (LPR) is a key component for autonomous driving, and its resilience to environmental corruption is critical for safety in high-stakes applications. While state-of-the-art (SOTA) LPR methods perform well in clean weather, they still struggle with weather-induced corruption commonly encountered in driving scenarios. To tackle this, we propose ResLPRNet, a novel LiDAR data restoration network that largely enhances LPR performance under adverse weather by restoring corrupted LiDAR scans using a wavelet transform-based network. ResLPRNet is efficient, lightweight and can be integrated plug-and-play with pretrained LPR models without substantial additional computational cost. Given the lack of LPR datasets under adverse weather, we introduce ResLPR, a novel benchmark that examines SOTA LPR methods under a wide range of LiDAR distortions induced by severe snow, fog, and rain conditions. Experiments on our proposed WeatherKITTI and WeatherNCLT datasets demonstrate the resilience and notable gains achieved by using our restoration method with multiple LPR approaches in challenging weather scenarios. Our code and benchmark are publicly available here: https://github.com/nubot-nudt/ResLPR.