Abstract:This paper proposes a novel Large Vision-Language Model (LVLM) and Model Predictive Control (MPC) integration framework that delivers both task scalability and safety for Autonomous Driving (AD). LVLMs excel at high-level task planning across diverse driving scenarios. However, since these foundation models are not specifically designed for driving and their reasoning is not consistent with the feasibility of low-level motion planning, concerns remain regarding safety and smooth task switching. This paper integrates LVLMs with MPC Builder, which automatically generates MPCs on demand, based on symbolic task commands generated by the LVLM, while ensuring optimality and safety. The generated MPCs can strongly assist the execution or rejection of LVLM-driven task switching by providing feedback on the feasibility of the given tasks and generating task-switching-aware MPCs. Our approach provides a safe, flexible, and adaptable control framework, bridging the gap between cutting-edge foundation models and reliable vehicle operation. We demonstrate the effectiveness of our approach through a simulation experiment, showing that our system can safely and effectively handle highway driving while maintaining the flexibility and adaptability of LVLMs.
Abstract:This paper proposes vehicle motion planning methods with obstacle avoidance in tight spaces by incorporating polygonal approximations of both the vehicle and obstacles into a model predictive control (MPC) framework. Representing these shapes is crucial for navigation in tight spaces to ensure accurate collision detection. However, incorporating polygonal approximations leads to disjunctive OR constraints in the MPC formulation, which require a mixed integer programming and cause significant computational cost. To overcome this, we propose two different collision-avoidance constraints that reformulate the disjunctive OR constraints as tractable conjunctive AND constraints: (1) a Support Vector Machine (SVM)-based formulation that recasts collision avoidance as a SVM optimization problem, and (2) a Minimum Signed Distance to Edges (MSDE) formulation that leverages minimum signed-distance metrics. We validate both methods through extensive simulations, including tight-space parking scenarios and varied-shape obstacle courses, as well as hardware experiments on an RC-car platform. Our results demonstrate that the SVM-based approach achieves superior navigation accuracy in constrained environments; the MSDE approach, by contrast, runs in real time with only a modest reduction in collision-avoidance performance.
Abstract:Four-wheel independent drive and steering vehicle (4WIDS Vehicle, Swerve Drive Robot) has the ability to move in any direction by its eight degrees of freedom (DoF) control inputs. Although the high maneuverability enables efficient navigation in narrow spaces, obtaining the optimal command is challenging due to the high dimension of the solution space. This paper presents a navigation architecture using the Model Predictive Path Integral (MPPI) control algorithm to avoid collisions with obstacles of any shape and reach a goal point. The key idea to make the problem easier is to explore the optimal control input in a reasonably reduced dimension that is adequate for navigation. Through evaluation in simulation, we found that selecting the sampling space of MPPI greatly affects navigation performance. In addition, our proposed controller which switches multiple sampling spaces according to the real-time situation can achieve balanced behavior between efficiency and safety. Source code is available at https://github.com/MizuhoAOKI/mppi_swerve_drive_ros
Abstract:This paper presents a novel Stochastic Optimal Control (SOC) method based on Model Predictive Path Integral control (MPPI), named Stein Variational Guided MPPI (SVG-MPPI), designed to handle rapidly shifting multimodal optimal action distributions. While MPPI can find a Gaussian-approximated optimal action distribution in closed form, i.e., without iterative solution updates, it struggles with multimodality of the optimal distributions, such as those involving non-convex constraints for obstacle avoidance. This is due to the less representative nature of the Gaussian. To overcome this limitation, our method aims to identify a target mode of the optimal distribution and guide the solution to converge to fit it. In the proposed method, the target mode is roughly estimated using a modified Stein Variational Gradient Descent (SVGD) method and embedded into the MPPI algorithm to find a closed-form "mode-seeking" solution that covers only the target mode, thus preserving the fast convergence property of MPPI. Our simulation and real-world experimental results demonstrate that SVG-MPPI outperforms both the original MPPI and other state-of-the-art sampling-based SOC algorithms in terms of path-tracking and obstacle-avoidance capabilities. Source code: https://github.com/kohonda/proj-svg_mppi
Abstract:Omni-directional images have been used in wide range of applications. For the applications, it would be useful to estimate saliency maps representing probability distributions of gazing points with a head-mounted display, to detect important regions in the omni-directional images. This paper proposes a novel saliency-map estimation model for the omni-directional images by extracting overlapping 2-dimensional (2D) plane images from omni-directional images at various directions and angles of view. While 2D saliency maps tend to have high probability at the center of images (center bias), the high-probability region appears at horizontal directions in omni-directional saliency maps when a head-mounted display is used (equator bias). Therefore, the 2D saliency model with a center-bias layer was fine-tuned with an omni-directional dataset by replacing the center-bias layer to an equator-bias layer conditioned on the elevation angle for the extraction of the 2D plane image. The limited availability of omni-directional images in saliency datasets can be compensated by using the well-established 2D saliency model pretrained by a large number of training images with the ground truth of 2D saliency maps. In addition, this paper proposes a multi-scale estimation method by extracting 2D images in multiple angles of view to detect objects of various sizes with variable receptive fields. The saliency maps estimated from the multiple angles of view were integrated by using pixel-wise attention weights calculated in an integration layer for weighting the optimal scale to each object. The proposed method was evaluated using a publicly available dataset with evaluation metrics for omni-directional saliency maps. It was confirmed that the accuracy of the saliency maps was improved by the proposed method.
Abstract:This study presents a new framework for vehicle motion planning and control based on the automatic generation of model predictive controllers (MPC) named MPC Builder. In this framework, several components necessary for MPC, such as models, constraints, and cost functions, are prepared in advance. The MPC Builder then online generates various MPCs according to traffic situations in a unified manner. This scheme enabled us to represent various driving tasks with minimal design effort. The proposed framework was implemented considering the continuation/generalized minimum residual (C/GMRES) method optimization solver, which can reduce computational costs. Finally, numerical experiments on multiple driving scenarios were presented.
Abstract:In recent years, the deep learning techniques have been applied to the estimation of saliency maps, which represent probability density functions of fixations when people look at the images. Although the methods of saliency-map estimation have been actively studied for 2-dimensional planer images, the methods for omni-directional images to be utilized in virtual environments had not been studied, until a competition of saliency-map estimation for the omni-directional images was held in ICME2017. In this paper, novel methods for estimating saliency maps for the omni-directional images are proposed considering the properties of prior distributions for fixations in the planar images and the omni-directional images.