Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shiqi Zhang

Unrestricted Global Phase Bias-Aware Single-channel Speech Enhancement with Conformer-based Metric GAN

Feb 13, 2024

Shiqi Zhang, Zheng Qiu, Daiki Takeuchi, Noboru Harada, Shoji Makino

Abstract:With the rapid development of neural networks in recent years, the ability of various networks to enhance the magnitude spectrum of noisy speech in the single-channel speech enhancement domain has become exceptionally outstanding. However, enhancing the phase spectrum using neural networks is often ineffective, which remains a challenging problem. In this paper, we found that the human ear cannot sensitively perceive the difference between a precise phase spectrum and a biased phase (BP) spectrum. Therefore, we propose an optimization method of phase reconstruction, allowing freedom on the global-phase bias instead of reconstructing the precise phase spectrum. We applied it to a Conformer-based Metric Generative Adversarial Networks (CMGAN) baseline model, which relaxes the existing constraints of precise phase and gives the neural network a broader learning space. Results show that this method achieves a new state-of-the-art performance without incurring additional computational overhead.

* Accepted by ICASSP 2024

Via

Access Paper or Ask Questions

ORLA: Mobile Manipulator-Based Object Rearrangement with Lazy A

Sep 24, 2023

Kai Gao, Yan Ding, Shiqi Zhang, Jingjin Yu

Figure 1 for ORLA*: Mobile Manipulator-Based Object Rearrangement with Lazy A*

Figure 2 for ORLA*: Mobile Manipulator-Based Object Rearrangement with Lazy A*

Figure 3 for ORLA*: Mobile Manipulator-Based Object Rearrangement with Lazy A*

Figure 4 for ORLA*: Mobile Manipulator-Based Object Rearrangement with Lazy A*

Abstract:Effectively performing object rearrangement is an essential skill for mobile manipulators, e.g., setting up a dinner table or organizing a desk. A key challenge in such problems is deciding an appropriate manipulation order for objects to effectively untangle dependencies between objects while considering the necessary motions for realizing the manipulations (e.g., pick and place). To our knowledge, computing time-optimal multi-object rearrangement solutions for mobile manipulators remains a largely untapped research direction. In this research, we propose ORLA*, which leverages delayed (lazy) evaluation in searching for a high-quality object pick and place sequence that considers both end-effector and mobile robot base travel. ORLA* also supports multi-layered rearrangement tasks considering pile stability using machine learning. Employing an optimal solver for finding temporary locations for displacing objects, ORLA* can achieve global optimality. Through extensive simulation and ablation study, we confirm the effectiveness of ORLA* delivering quality solutions for challenging rearrangement instances. Supplementary materials are available at: https://gaokai15.github.io/ORLA-Star/

* Submitted to ICRA 2024

Via

Access Paper or Ask Questions

Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control

Sep 08, 2023

David DeFazio, Eisuke Hirota, Shiqi Zhang

Figure 1 for Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control

Figure 2 for Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control

Figure 3 for Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control

Figure 4 for Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control

Abstract:Seeing-eye robots are very useful tools for guiding visually impaired people, potentially producing a huge societal impact given the low availability and high cost of real guide dogs. Although a few seeing-eye robot systems have already been demonstrated, none considered external tugs from humans, which frequently occur in a real guide dog setting. In this paper, we simultaneously train a locomotion controller that is robust to external tugging forces via Reinforcement Learning (RL), and an external force estimator via supervised learning. The controller ensures stable walking, and the force estimator enables the robot to respond to the external forces from the human. These forces are used to guide the robot to the global goal, which is unknown to the robot, while the robot guides the human around nearby obstacles via a local planner. Experimental results in simulation and on hardware show that our controller is robust to external forces, and our seeing-eye system can accurately detect force direction. We demonstrate our full seeing-eye robot system on a real quadruped robot with a blindfolded human. The video can be seen at our project page: https://bu-air-lab.github.io/guide_dog/

* Accepted to CoRL 2023

Via

Access Paper or Ask Questions

Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning

Jul 21, 2023

Xiaohan Zhang, Yifeng Zhu, Yan Ding, Yuqian Jiang, Yuke Zhu, Peter Stone, Shiqi Zhang

Figure 1 for Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning

Figure 2 for Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning

Figure 3 for Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning

Figure 4 for Symbolic State Space Optimization for Long Horizon Mobile Manipulation Planning

Abstract:In existing task and motion planning (TAMP) research, it is a common assumption that experts manually specify the state space for task-level planning. A well-developed state space enables the desirable distribution of limited computational resources between task planning and motion planning. However, developing such task-level state spaces can be non-trivial in practice. In this paper, we consider a long horizon mobile manipulation domain including repeated navigation and manipulation. We propose Symbolic State Space Optimization (S3O) for computing a set of abstracted locations and their 2D geometric groundings for generating task-motion plans in such domains. Our approach has been extensively evaluated in simulation and demonstrated on a real mobile manipulator working on clearing up dining tables. Results show the superiority of the proposed method over TAMP baselines in task completion rate and execution time.

* To be published in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023

Via

Access Paper or Ask Questions

Integrating Action Knowledge and LLMs for Task Planning and Situation Handling in Open Worlds

May 27, 2023

Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

Abstract:Task planning systems have been developed to help robots use human knowledge (about actions) to complete long-horizon tasks. Most of them have been developed for "closed worlds" while assuming the robot is provided with complete world knowledge. However, the real world is generally open, and the robots frequently encounter unforeseen situations that can potentially break the planner's completeness. Could we leverage the recent advances on pre-trained Large Language Models (LLMs) to enable classical planning systems to deal with novel situations? This paper introduces a novel framework, called COWP, for open-world task planning and situation handling. COWP dynamically augments the robot's action knowledge, including the preconditions and effects of actions, with task-oriented commonsense knowledge. COWP embraces the openness from LLMs, and is grounded to specific domains via action knowledge. For systematic evaluations, we collected a dataset that includes 1,085 execution-time situations. Each situation corresponds to a state instance wherein a robot is potentially unable to complete a task using a solution that normally works. Experimental results show that our approach outperforms competitive baselines from the literature in the success rate of service tasks. Additionally, we have demonstrated COWP using a mobile manipulator. Supplementary materials are available at: https://cowplanning.github.io/

* arXiv admin note: substantial text overlap with arXiv:2210.01287

Via

Access Paper or Ask Questions

ARDIE: AR, Dialogue, and Eye Gaze Policies for Human-Robot Collaboration

May 08, 2023

Chelsea Zou, Kishan Chandan, Yan Ding, Shiqi Zhang

Figure 1 for ARDIE: AR, Dialogue, and Eye Gaze Policies for Human-Robot Collaboration

Figure 2 for ARDIE: AR, Dialogue, and Eye Gaze Policies for Human-Robot Collaboration

Figure 3 for ARDIE: AR, Dialogue, and Eye Gaze Policies for Human-Robot Collaboration

Abstract:Human-robot collaboration (HRC) has become increasingly relevant in industrial, household, and commercial settings. However, the effectiveness of such collaborations is highly dependent on the human and robots' situational awareness of the environment. Improving this awareness includes not only aligning perceptions in a shared workspace, but also bidirectionally communicating intent and visualizing different states of the environment to enhance scene understanding. In this paper, we propose ARDIE (Augmented Reality with Dialogue and Eye Gaze), a novel intelligent agent that leverages multi-modal feedback cues to enhance HRC. Our system utilizes a decision theoretic framework to formulate a joint policy that incorporates interactive augmented reality (AR), natural language, and eye gaze to portray current and future states of the environment. Through object-specific AR renders, the human can visualize future object interactions to make adjustments as needed, ultimately providing an interactive and efficient collaboration between humans and robots.

Via

Access Paper or Ask Questions

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

May 05, 2023

Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, Peter Stone

Abstract:Large language models (LLMs) have demonstrated remarkable zero-shot generalization abilities: state-of-the-art chatbots can provide plausible answers to many common questions that arise in daily life. However, so far, LLMs cannot reliably solve long-horizon planning problems. By contrast, classical planners, once a problem is given in a formatted way, can use efficient search algorithms to quickly identify correct, or even optimal, plans. In an effort to get the best of both worlds, this paper introduces LLM+P, the first framework that incorporates the strengths of classical planners into LLMs. LLM+P takes in a natural language description of a planning problem, then returns a correct (or optimal) plan for solving that problem in natural language. LLM+P does so by first converting the language description into a file written in the planning domain definition language (PDDL), then leveraging classical planners to quickly find a solution, and then translating the found solution back into natural language. Along with LLM+P, we define a diverse set of different benchmark problems taken from common planning scenarios. Via a comprehensive set of experiments on these benchmark problems, we find that LLM+P is able to provide optimal solutions for most problems, while LLMs fail to provide even feasible plans for most problems.\footnote{The code and results are publicly available at https://github.com/Cranial-XIX/llm-pddl.git.

Via

Access Paper or Ask Questions

Grounding Classical Task Planners via Vision-Language Models

Apr 17, 2023

Xiaohan Zhang, Yan Ding, Saeid Amiri, Hao Yang, Andy Kaminski, Chad Esselink, Shiqi Zhang

Figure 1 for Grounding Classical Task Planners via Vision-Language Models

Figure 2 for Grounding Classical Task Planners via Vision-Language Models

Figure 3 for Grounding Classical Task Planners via Vision-Language Models

Figure 4 for Grounding Classical Task Planners via Vision-Language Models

Abstract:Classical planning systems have shown great advances in utilizing rule-based human knowledge to compute accurate plans for service robots, but they face challenges due to the strong assumptions of perfect perception and action executions. To tackle these challenges, one solution is to connect the symbolic states and actions generated by classical planners to the robot's sensory observations, thus closing the perception-action loop. This research proposes a visually-grounded planning framework, named TPVQA, which leverages Vision-Language Models (VLMs) to detect action failures and verify action affordances towards enabling successful plan execution. Results from quantitative experiments show that TPVQA surpasses competitive baselines from previous studies in task completion rate.

Via

Access Paper or Ask Questions

Task and Motion Planning with Large Language Models for Object Rearrangement

Mar 14, 2023

Yan Ding, Xiaohan Zhang, Chris Paxton, Shiqi Zhang

Figure 1 for Task and Motion Planning with Large Language Models for Object Rearrangement

Figure 2 for Task and Motion Planning with Large Language Models for Object Rearrangement

Figure 3 for Task and Motion Planning with Large Language Models for Object Rearrangement

Figure 4 for Task and Motion Planning with Large Language Models for Object Rearrangement

Abstract:Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning is frequently needed in this process. However, achieving commonsense arrangements requires knowledge about objects, which is hard to transfer to robots. Large language models (LLMs) are one potential source of this knowledge, but they do not naively capture information about plausible physical arrangements of the world. We propose LLM-GROP, which uses prompting to extract commonsense knowledge about semantically valid object configurations from an LLM and instantiates them with a task and motion planner in order to generalize to varying scene geometry. LLM-GROP allows us to go from natural-language commands to human-aligned object rearrangement in varied environments. Based on human evaluations, our approach achieves the highest rating while outperforming competitive baselines in terms of success rate while maintaining comparable cumulative action costs. Finally, we demonstrate a practical implementation of LLM-GROP on a mobile manipulator in real-world scenarios. Supplementary materials are available at: https://sites.google.com/view/llm-grop

Via

Access Paper or Ask Questions

Learning Visualization Policies of Augmented Reality for Human-Robot Collaboration

Nov 13, 2022

Kishan Chandan, Jack Albertson, Shiqi Zhang

Abstract:In human-robot collaboration domains, augmented reality (AR) technologies have enabled people to visualize the state of robots. Current AR-based visualization policies are designed manually, which requires a lot of human efforts and domain knowledge. When too little information is visualized, human users find the AR interface not useful; when too much information is visualized, they find it difficult to process the visualized information. In this paper, we develop a framework, called VARIL, that enables AR agents to learn visualization policies (what to visualize, when, and how) from demonstrations. We created a Unity-based platform for simulating warehouse environments where human-robot teammates collaborate on delivery tasks. We have collected a dataset that includes demonstrations of visualizing robots' current and planned behaviors. Results from experiments with real human participants show that, compared with competitive baselines from the literature, our learned visualization strategies significantly increase the efficiency of human-robot teams, while reducing the distraction level of human users. VARIL has been demonstrated in a built-in-lab mock warehouse.

* Accepted to the Conference on Robot Learning (CoRL), 2022

Via

Access Paper or Ask Questions