Abstract:Zero-shot object-goal navigation (ZSON) is a challenging problem in robotics that requires a comprehensive understanding of both language and visual observations. Contextual cues from rooms and objects are critical, but their relative importance depends on the target: some objects are strongly tied to specific room types, while others are better predicted by nearby co-located objects. Existing methods overlook this distinction, leading to inefficient and inaccurate exploration. We present CLUE, a novel navigation framework that adaptively balances the use of contextual rooms and objects by leveraging commonsense knowledge extracted from an offline large language model (LLM). By estimating a target's association with room types using LLM, the agent prioritizes room cues for predictable objects and object cues for those with weak room associations. Our framework constructs a unified semantic value map that integrates both types of contextual information, adaptively weighted by the target's ambiguity to guide exploration. Combined with multi-viewpoint verification and an exploration strategy informed by contextual cues, CLUE achieves robust and efficient navigation. Extensive experiments in simulation and real-world deployments show that our method consistently outperforms state-of-the-art baselines in both success rate (SR) and success weighted by path length (SPL), demonstrating its effectiveness and practicality for real-world navigation tasks.
Abstract:Rank-based input normalization is a workhorse of modern machine learning, prized for its robustness to scale, monotone transformations, and batch-to-batch variation. In many real systems, the ordering of feature values matters far more than their raw magnitudes - yet the structural conditions that a rank-based normalization operator must satisfy to remain stable under these invariances have never been formally pinned down. We show that widely used differentiable sorting and ranking operators fundamentally fail these criteria. Because they rely on value gaps and batch-level pairwise interactions, they are intrinsically unstable under strictly monotone transformations, shifts in mini-batch composition, and even tiny input perturbations. Crucially, these failures stem from the operators' structural design, not from incidental implementation choices. To address this, we propose three axioms that formalize the minimal invariance and stability properties required of rank-based input normalization. We prove that any operator satisfying these axioms must factor into (i) a feature-wise rank representation and (ii) a scalarization map that is both monotone and Lipschitz-continuous. We then construct a minimal operator that meets these criteria and empirically show that the resulting constraints are non-trivial in realistic setups. Together, our results sharply delineate the design space of valid rank-based normalization operators and formally separate them from existing continuous-relaxation-based sorting methods.