Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hang Liu

Stevens Institute of Technology

Imaginative World Modeling with Scene Graphs for Embodied Agent Navigation

Aug 09, 2025

Yue Hu, Junzhe Wu, Ruihan Xu, Hang Liu, Avery Xi, Henry X. Liu, Ram Vasudevan, Maani Ghaffari

Abstract:Semantic navigation requires an agent to navigate toward a specified target in an unseen environment. Employing an imaginative navigation strategy that predicts future scenes before taking action, can empower the agent to find target faster. Inspired by this idea, we propose SGImagineNav, a novel imaginative navigation framework that leverages symbolic world modeling to proactively build a global environmental representation. SGImagineNav maintains an evolving hierarchical scene graphs and uses large language models to predict and explore unseen parts of the environment. While existing methods solely relying on past observations, this imaginative scene graph provides richer semantic context, enabling the agent to proactively estimate target locations. Building upon this, SGImagineNav adopts an adaptive navigation strategy that exploits semantic shortcuts when promising and explores unknown areas otherwise to gather additional context. This strategy continuously expands the known environment and accumulates valuable semantic contexts, ultimately guiding the agent toward the target. SGImagineNav is evaluated in both real-world scenarios and simulation benchmarks. SGImagineNav consistently outperforms previous methods, improving success rate to 65.4 and 66.8 on HM3D and HSSD, and demonstrating cross-floor and cross-room navigation in real-world environments, underscoring its effectiveness and generalizability.

* 23 pages

Via

Access Paper or Ask Questions

RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer

Jun 13, 2025

Haotian Ni, Yake Wei, Hang Liu, Gong Chen, Chong Peng, Hao Lin, Di Hu

Abstract:Multimodal learning faces challenges in effectively fusing information from diverse modalities, especially when modality quality varies across samples. Dynamic fusion strategies, such as attention mechanism in Transformers, aim to address such challenge by adaptively emphasizing modalities based on the characteristics of input data. However, through amounts of carefully designed experiments, we surprisingly observed that the dynamic adaptability of widely-used self-attention models diminishes. Model tends to prefer one modality regardless of data characteristics. This bias triggers a self-reinforcing cycle that progressively overemphasizes the favored modality, widening the distribution gap in attention keys across modalities and deactivating attention mechanism's dynamic properties. To revive adaptability, we propose a simple yet effective method Rolling Query (RollingQ), which balances attention allocation by rotating the query to break the self-reinforcing cycle and mitigate the key distribution gap. Extensive experiments on various multimodal scenarios validate the effectiveness of RollingQ and the restoration of cooperation dynamics is pivotal for enhancing the broader capabilities of widely deployed multimodal Transformers. The source code is available at https://github.com/GeWu-Lab/RollingQ_ICML2025.

* Accepted by ICML 2025

Via

Access Paper or Ask Questions

Analyzing 16,193 LLM Papers for Fun and Profits

Apr 15, 2025

Zhiqiu Xia, Lang Zhu, Bingzhe Li, Feng Chen, Qiannan Li, Chunhua Liao, Feiyi Wang, Hang Liu

Abstract:Large Language Models (LLMs) are reshaping the landscape of computer science research, driving significant shifts in research priorities across diverse conferences and fields. This study provides a comprehensive analysis of the publication trend of LLM-related papers in 77 top-tier computer science conferences over the past six years (2019-2024). We approach this analysis from four distinct perspectives: (1) We investigate how LLM research is driving topic shifts within major conferences. (2) We adopt a topic modeling approach to identify various areas of LLM-related topic growth and reveal the topics of concern at different conferences. (3) We explore distinct contribution patterns of academic and industrial institutions. (4) We study the influence of national origins on LLM development trajectories. Synthesizing the findings from these diverse analytical angles, we derive ten key insights that illuminate the dynamics and evolution of the LLM research ecosystem.

Via

Access Paper or Ask Questions

Region Based SLAM-Aware Exploration: Efficient and Robust Autonomous Mapping Strategy That Can Scale

Apr 14, 2025

Megha Maheshwari, Sadeigh Rabiee, He Yin, Martin Labrie, Hang Liu

Abstract:Autonomous exploration for mapping unknown large scale environments is a fundamental challenge in robotics, with efficiency in time, stability against map corruption and computational resources being crucial. This paper presents a novel approach to indoor exploration that addresses these key issues in existing methods. We introduce a Simultaneous Localization and Mapping (SLAM)-aware region-based exploration strategy that partitions the environment into discrete regions, allowing the robot to incrementally explore and stabilize each region before moving to the next one. This approach significantly reduces redundant exploration and improves overall efficiency. As the device finishes exploring a region and stabilizes it, we also perform SLAM keyframe marginalization, a technique which reduces problem complexity by eliminating variables, while preserving their essential information. To improves robustness and further enhance efficiency, we develop a check- point system that enables the robot to resume exploration from the last stable region in case of failures, eliminating the need for complete re-exploration. Our method, tested in real homes, office and simulations, outperforms state-of-the-art approaches. The improvements demonstrate substantial enhancements in various real world environments, with significant reductions in keyframe usage (85%), submap usage (50% office, 32% home), pose graph optimization time (78-80%), and exploration duration (10-15%). This region-based strategy with keyframe marginalization offers an efficient solution for autonomous robotic mapping.

* 8 pages, 9 figures

Via

Access Paper or Ask Questions

A Two-Timescale Approach for Wireless Federated Learning with Parameter Freezing and Power Control

Apr 02, 2025

Jinhao Ouyang, Yuan Liu, Hang Liu

Figure 1 for A Two-Timescale Approach for Wireless Federated Learning with Parameter Freezing and Power Control

Figure 2 for A Two-Timescale Approach for Wireless Federated Learning with Parameter Freezing and Power Control

Figure 3 for A Two-Timescale Approach for Wireless Federated Learning with Parameter Freezing and Power Control

Figure 4 for A Two-Timescale Approach for Wireless Federated Learning with Parameter Freezing and Power Control

Abstract:Federated learning (FL) enables distributed devices to train a shared machine learning (ML) model collaboratively while protecting their data privacy. However, the resource-limited mobile devices suffer from intensive computation-and-communication costs of model parameters. In this paper, we observe the phenomenon that the model parameters tend to be stabilized long before convergence during training process. Based on this observation, we propose a two-timescale FL framework by joint optimization of freezing stabilized parameters and controlling transmit power for the unstable parameters to balance the energy consumption and convergence. First, we analyze the impact of model parameter freezing and unreliable transmission on the convergence rate. Next, we formulate a two-timescale optimization problem of parameter freezing percentage and transmit power to minimize the model convergence error subject to the energy budget. To solve this problem, we decompose it into parallel sub-problems and decompose each sub-problem into two different timescales problems using the Lyapunov optimization method. The optimal parameter freezing and power control strategies are derived in an online fashion. Experimental results demonstrate the superiority of the proposed scheme compared with the benchmark schemes.

* 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, including reprinting, republishing, or reuse in other works. This work has been accepted to IEEE Transactions on Mobile Computing

Via

Access Paper or Ask Questions

Deal: Distributed End-to-End GNN Inference for All Nodes

Mar 04, 2025

Shiyang Chen, Xiang Song, Vasiloudis Theodore, Hang Liu

Figure 1 for Deal: Distributed End-to-End GNN Inference for All Nodes

Figure 2 for Deal: Distributed End-to-End GNN Inference for All Nodes

Figure 3 for Deal: Distributed End-to-End GNN Inference for All Nodes

Figure 4 for Deal: Distributed End-to-End GNN Inference for All Nodes

Abstract:Graph Neural Networks (GNNs) are a new research frontier with various applications and successes. The end-to-end inference for all nodes, is common for GNN embedding models, which are widely adopted in applications like recommendation and advertising. While sharing opportunities arise in GNN tasks (i.e., inference for a few nodes and training), the potential for sharing in full graph end-to-end inference is largely underutilized because traditional efforts fail to fully extract sharing benefits due to overwhelming overheads or excessive memory usage. This paper introduces Deal, a distributed GNN inference system that is dedicated to end-to-end inference for all nodes for graphs with multi-billion edges. First, we unveil and exploit an untapped sharing opportunity during sampling, and maximize the benefits from sharing during subsequent GNN computation. Second, we introduce memory-saving and communication-efficient distributed primitives for lightweight 1-D graph and feature tensor collaborative partitioning-based distributed inference. Third, we introduce partitioned, pipelined communication and fusing feature preparation with the first GNN primitive for end-to-end inference. With Deal, the end-to-end inference time on real-world benchmark datasets is reduced up to 7.70 x and the graph construction time is reduced up to 21.05 x, compared to the state-of-the-art.

Via

Access Paper or Ask Questions

Discrete-Time Hybrid Automata Learning: Legged Locomotion Meets Skateboarding

Mar 03, 2025

Hang Liu, Sangli Teng, Ben Liu, Wei Zhang, Maani Ghaffari

Abstract:This paper introduces Discrete-time Hybrid Automata Learning (DHAL), a framework using on-policy Reinforcement Learning to identify and execute mode-switching without trajectory segmentation or event function learning. Hybrid dynamical systems, which include continuous flow and discrete mode switching, can model robotics tasks like legged robot locomotion. Model-based methods usually depend on predefined gaits, while model-free approaches lack explicit mode-switching knowledge. Current methods identify discrete modes via segmentation before regressing continuous flow, but learning high-dimensional complex rigid body dynamics without trajectory labels or segmentation is a challenging open problem. Our approach incorporates a beta policy distribution and a multi-critic architecture to model contact-guided motions, exemplified by a challenging quadrupedal robot skateboard task. We validate our method through simulations and real-world tests, demonstrating robust performance in hybrid dynamical systems.

Via

Access Paper or Ask Questions

MBC: Multi-Brain Collaborative Control for Quadruped Robots

Sep 24, 2024

Hang Liu, Yi Cheng, Rankun Li, Xiaowen Hu, Linqi Ye, Houde Liu

Abstract:In the field of locomotion task of quadruped robots, Blind Policy and Perceptive Policy each have their own advantages and limitations. The Blind Policy relies on preset sensor information and algorithms, suitable for known and structured environments, but it lacks adaptability in complex or unknown environments. The Perceptive Policy uses visual sensors to obtain detailed environmental information, allowing it to adapt to complex terrains, but its effectiveness is limited under occluded conditions, especially when perception fails. Unlike the Blind Policy, the Perceptive Policy is not as robust under these conditions. To address these challenges, we propose a MBC:Multi-Brain collaborative system that incorporates the concepts of Multi-Agent Reinforcement Learning and introduces collaboration between the Blind Policy and the Perceptive Policy. By applying this multi-policy collaborative model to a quadruped robot, the robot can maintain stable locomotion even when the perceptual system is impaired or observational data is incomplete. Our simulations and real-world experiments demonstrate that this system significantly improves the robot's passability and robustness against perception failures in complex environments, validating the effectiveness of multi-policy collaboration in enhancing robotic motion performance.

* 18 pages, 9 figures, Website and Videos: https://quad-mbc.github.io/

Via

Access Paper or Ask Questions

Structural Optimization of Lightweight Bipedal Robot via SERL

Aug 28, 2024

Yi Cheng, Chenxi Han, Yuheng Min, Linqi Ye, Houde Liu, Hang Liu

Figure 1 for Structural Optimization of Lightweight Bipedal Robot via SERL

Figure 2 for Structural Optimization of Lightweight Bipedal Robot via SERL

Figure 3 for Structural Optimization of Lightweight Bipedal Robot via SERL

Figure 4 for Structural Optimization of Lightweight Bipedal Robot via SERL

Abstract:Designing a bipedal robot is a complex and challenging task, especially when dealing with a multitude of structural parameters. Traditional design methods often rely on human intuition and experience. However, such approaches are time-consuming, labor-intensive, lack theoretical guidance and hard to obtain optimal design results within vast design spaces, thus failing to full exploit the inherent performance potential of robots. In this context, this paper introduces the SERL (Structure Evolution Reinforcement Learning) algorithm, which combines reinforcement learning for locomotion tasks with evolution algorithms. The aim is to identify the optimal parameter combinations within a given multidimensional design space. Through the SERL algorithm, we successfully designed a bipedal robot named Wow Orin, where the optimal leg length are obtained through optimization based on body structure and motor torque. We have experimentally validated the effectiveness of the SERL algorithm, which is capable of optimizing the best structure within specified design space and task conditions. Additionally, to assess the performance gap between our designed robot and the current state-of-the-art robots, we compared Wow Orin with mainstream bipedal robots Cassie and Unitree H1. A series of experimental results demonstrate the Outstanding energy efficiency and performance of Wow Orin, further validating the feasibility of applying the SERL algorithm to practical design.

Via

Access Paper or Ask Questions

Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

Aug 11, 2024

Hannuo Zhang, Huihui Li, Jiarui Lin, Yujie Zhang, Jianghua Fan, Hang Liu

Figure 1 for Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

Figure 2 for Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

Figure 3 for Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

Figure 4 for Seg-CycleGAN : SAR-to-optical image translation guided by a downstream task

Abstract:Optical remote sensing and Synthetic Aperture Radar(SAR) remote sensing are crucial for earth observation, offering complementary capabilities. While optical sensors provide high-quality images, they are limited by weather and lighting conditions. In contrast, SAR sensors can operate effectively under adverse conditions. This letter proposes a GAN-based SAR-to-optical image translation method named Seg-CycleGAN, designed to enhance the accuracy of ship target translation by leveraging semantic information from a pre-trained semantic segmentation model. Our method utilizes the downstream task of ship target semantic segmentation to guide the training of image translation network, improving the quality of output Optical-styled images. The potential of foundation-model-annotated datasets in SAR-to-optical translation tasks is revealed. This work suggests broader research and applications for downstream-task-guided frameworks. The code will be available at https://github.com/NPULHH/

* 8 pages, 5 figures

Via

Access Paper or Ask Questions