Alert button
Picture for Zhirui Dai

Zhirui Dai

Alert button

Optimal Scene Graph Planning with Large Language Model Guidance

Sep 17, 2023
Zhirui Dai, Arash Asgharivaskasi, Thai Duong, Shusen Lin, Maria-Elizabeth Tzes, George Pappas, Nikolay Atanasov

Figure 1 for Optimal Scene Graph Planning with Large Language Model Guidance
Figure 2 for Optimal Scene Graph Planning with Large Language Model Guidance
Figure 3 for Optimal Scene Graph Planning with Large Language Model Guidance
Figure 4 for Optimal Scene Graph Planning with Large Language Model Guidance

Recent advances in metric, semantic, and topological mapping have equipped autonomous robots with semantic concept grounding capabilities to interpret natural language tasks. This work aims to leverage these new capabilities with an efficient task planning algorithm for hierarchical metric-semantic models. We consider a scene graph representation of the environment and utilize a large language model (LLM) to convert a natural language task into a linear temporal logic (LTL) automaton. Our main contribution is to enable optimal hierarchical LTL planning with LLM guidance over scene graphs. To achieve efficiency, we construct a hierarchical planning domain that captures the attributes and connectivity of the scene graph and the task automaton, and provide semantic guidance via an LLM heuristic function. To guarantee optimality, we design an LTL heuristic function that is provably consistent and supplements the potentially inadmissible LLM guidance in multi-heuristic planning. We demonstrate efficient planning of complex natural language tasks in scene graphs of virtualized real environments.

Viaarxiv icon

BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning

Oct 12, 2021
Zhirui Dai, Yuepeng Jiang, Yi Li, Bo Liu, Antoni B. Chan, Nuno Vasconcelos

Figure 1 for BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning
Figure 2 for BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning
Figure 3 for BEV-Net: Assessing Social Distancing Compliance by Joint People Localization and Geometric Reasoning

Social distancing, an essential public health measure to limit the spread of contagious diseases, has gained significant attention since the outbreak of the COVID-19 pandemic. In this work, the problem of visual social distancing compliance assessment in busy public areas, with wide field-of-view cameras, is considered. A dataset of crowd scenes with people annotations under a bird's eye view (BEV) and ground truth for metric distances is introduced, and several measures for the evaluation of social distance detection systems are proposed. A multi-branch network, BEV-Net, is proposed to localize individuals in world coordinates and identify high-risk regions where social distancing is violated. BEV-Net combines detection of head and feet locations, camera pose estimation, a differentiable homography module to map image into BEV coordinates, and geometric reasoning to produce a BEV map of the people locations in the scene. Experiments on complex crowded scenes demonstrate the power of the approach and show superior performance over baselines derived from methods in the literature. Applications of interest for public health decision makers are finally discussed. Datasets, code and pretrained models are publicly available at GitHub.

* Published as a conference paper at International Conference on Computer Vision, 2021 
Viaarxiv icon