Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Wang

IBM T. J. Watson Research Center

Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

Nov 14, 2023

Weiqin Zu, Wenbin Song, Ruiqing Chen, Ze Guo, Fanglei Sun, Zheng Tian, Wei Pan, Jun Wang

Figure 1 for Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

Figure 2 for Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

Figure 3 for Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

Figure 4 for Language and Sketching: An LLM-driven Interactive Multimodal Multitask Robot Navigation Framework

Abstract:The socially-aware navigation system has evolved to adeptly avoid various obstacles while performing multiple tasks, such as point-to-point navigation, human-following, and -guiding. However, a prominent gap persists: in Human-Robot Interaction (HRI), the procedure of communicating commands to robots demands intricate mathematical formulations. Furthermore, the transition between tasks does not quite possess the intuitive control and user-centric interactivity that one would desire. In this work, we propose an LLM-driven interactive multimodal multitask robot navigation framework, termed LIM2N, to solve the above new challenge in the navigation field. We achieve this by first introducing a multimodal interaction framework where language and hand-drawn inputs can serve as navigation constraints and control objectives. Next, a reinforcement learning agent is built to handle multiple tasks with the received information. Crucially, LIM2N creates smooth cooperation among the reasoning of multimodal input, multitask planning, and adaptation and processing of the intelligent sensing modules in the complicated system. Extensive experiments are conducted in both simulation and the real world demonstrating that LIM2N has superior user needs understanding, alongside an enhanced interactive experience.

Via

Access Paper or Ask Questions

Multi-User Multi-IoT-Device Symbiotic Radio: A Novel Massive Access Scheme for Cellular IoT

Nov 06, 2023

Jun Wang, Ying-Chang Liang, Sumei Sun

Figure 1 for Multi-User Multi-IoT-Device Symbiotic Radio: A Novel Massive Access Scheme for Cellular IoT

Figure 2 for Multi-User Multi-IoT-Device Symbiotic Radio: A Novel Massive Access Scheme for Cellular IoT

Figure 3 for Multi-User Multi-IoT-Device Symbiotic Radio: A Novel Massive Access Scheme for Cellular IoT

Figure 4 for Multi-User Multi-IoT-Device Symbiotic Radio: A Novel Massive Access Scheme for Cellular IoT

Abstract:Symbiotic radio (SR) is a promising technique to support cellular Internet-of-Things (IoT) by forming a mutualistic relationship between IoT and cellular transmissions. In this paper, we propose a novel multi-user multi-IoT-device SR system to enable massive access in cellular IoT. In the considered system, the base station (BS) transmits information to multiple cellular users, and a number of IoT devices simultaneously backscatter their information to these users via the cellular signal. The cellular users jointly decode the information from the BS and IoT devices. Noting that the reflective links from the IoT devices can be regarded as the channel uncertainty of the direct links, we apply the robust design method to design the beamforming vectors at the BS. Specifically, the transmit power is minimized under the cellular transmission outage probability constraints and IoT transmission sum rate constraints. The algorithm based on semi-definite programming and difference-of-convex programming is proposed to solve the power minimization problem. Moreover, we consider a special case where each cellular user is associated with several adjacent IoT devices and propose a direction of arrival (DoA)-based transmit beamforming design approach. The DoA-based approach requires only the DoA and angular spread (AS) of the direct links instead of the instantaneous channel state information (CSI) of the reflective link channels, leading to a significant reduction in the channel feedback overhead. Simulation results have substantiated the multi-user multi-IoT-device SR system and the effectiveness of the proposed beamforming approaches. It is shown that the DoA-based beamforming approach achieves comparable performance as the CSI-based approach in the special case when the ASs are small.

* 13 pages, 12 figures, Conference J. Wang and Y.-C. Liang, Transmit beamforming design for multiuser multi-IoT-device symbiotic radios, in Proc. IEEE ICC, Rome, Italy, May 2023, pp. 1-6

Via

Access Paper or Ask Questions

Structure design and coordinated motion analysis of bionic crocodile robot

Nov 03, 2023

Jun Wang, Jingya Zheng, Yuhang Zhao, Kai Yang

Figure 1 for Structure design and coordinated motion analysis of bionic crocodile robot

Figure 2 for Structure design and coordinated motion analysis of bionic crocodile robot

Figure 3 for Structure design and coordinated motion analysis of bionic crocodile robot

Figure 4 for Structure design and coordinated motion analysis of bionic crocodile robot

Abstract:Crocodiles, known as one of the oldest and most resilient species on Earth, have demonstrated remarkable locomotor abilities both on land and in water, evolving over millennia to adapt to diverse environments. In this paper, we draw inspiration from crocodiles and introduce a highly biomimetic crocodile robot equipped with multiple degrees of freedom and articulated trunk joints. This design is based on a comprehensive analysis of the structural and motion characteristics observed in real crocodiles. The bionic crocodile robot has the problem of limb-torso incoordination during movement, in order to solve this problem, we apply the D-H method for both forward and inverse kinematics analysis of the robot's legs and spine. Through a series of simulation experiments, we investigate the robot's stability of motion, fault tolerance, and adaptability to the environment in two motor pattern: with and without the involvement of the spine and tail in its movements. Experiment results demonstrate that the bionic crocodile robot exhibits superior motion performance when the spine and tail cooperate with the extremities. This research not only showcases the potential of biomimicry in robotics but also underscores the significance of understanding how nature's designs can inform and enhance our technological innovations.

Via

Access Paper or Ask Questions

Why Can Large Language Models Generate Correct Chain-of-Thoughts?

Oct 30, 2023

Rasul Tutunov, Antoine Grosnit, Juliusz Ziomek, Jun Wang, Haitham Bou-Ammar

Abstract:This paper delves into the capabilities of large language models (LLMs), specifically focusing on advancing the theoretical comprehension of chain-of-thought prompting. We investigate how LLMs can be effectively induced to generate a coherent chain of thoughts. To achieve this, we introduce a two-level hierarchical graphical model tailored for natural language generation. Within this framework, we establish a compelling geometrical convergence rate that gauges the likelihood of an LLM-generated chain of thoughts compared to those originating from the true language. Our findings provide a theoretical justification for the ability of LLMs to produce the correct sequence of thoughts (potentially) explaining performance gains in tasks demanding reasoning skills.

Via

Access Paper or Ask Questions

Ask more, know better: Reinforce-Learned Prompt Questions for Decision Making with Large Language Models

Oct 27, 2023

Xue Yan, Yan Song, Xinyu Cui, Filippos Christianos, Haifeng Zhang, David Henry Mguni, Jun Wang

Abstract:Large language models (LLMs) demonstrate their promise in tackling complicated practical challenges by combining action-based policies with chain of thought (CoT) reasoning. Having high-quality prompts on hand, however, is vital to the framework's effectiveness. Currently, these prompts are handcrafted utilizing extensive human labor, resulting in CoT policies that frequently fail to generalize. Human intervention is also required in order to develop grounding functions that ensure low-level controllers appropriately process CoT reasoning. In this paper, we take the first step towards a fully integrated end-to-end framework for task-solving in real settings employing complicated reasoning. To that purpose, we offer a new leader-follower bilevel framework capable of learning to ask relevant questions (prompts) and subsequently undertaking reasoning to guide the learning of actions to be performed in an environment. A good prompt should make introspective revisions based on historical findings, leading the CoT to consider the anticipated goals. A prompt-generator policy has its own aim in our system, allowing it to adapt to the action policy and automatically root the CoT process towards outputs that lead to decisive, high-performing actions. Meanwhile, the action policy is learning how to use the CoT outputs to take specific actions. Our empirical data reveal that our system outperforms leading methods in agent learning benchmarks such as Overcooked and FourRoom.

Via

Access Paper or Ask Questions

Specify Robust Causal Representation from Mixed Observations

Oct 21, 2023

Mengyue Yang, Xinyu Cai, Furui Liu, Weinan Zhang, Jun Wang

Figure 1 for Specify Robust Causal Representation from Mixed Observations

Figure 2 for Specify Robust Causal Representation from Mixed Observations

Figure 3 for Specify Robust Causal Representation from Mixed Observations

Figure 4 for Specify Robust Causal Representation from Mixed Observations

Abstract:Learning representations purely from observations concerns the problem of learning a low-dimensional, compact representation which is beneficial to prediction models. Under the hypothesis that the intrinsic latent factors follow some casual generative models, we argue that by learning a causal representation, which is the minimal sufficient causes of the whole system, we can improve the robustness and generalization performance of machine learning models. In this paper, we develop a learning method to learn such representation from observational data by regularizing the learning procedure with mutual information measures, according to the hypothetical factored causal graph. We theoretically and empirically show that the models trained with the learned causal representations are more robust under adversarial attacks and distribution shifts compared with baselines. The supplementary materials are available at https://github.com/ymy $4323460 / \mathrm{CaRI} /$.

* arXiv admin note: substantial text overlap with arXiv:2202.08388

Via

Access Paper or Ask Questions

Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue

Oct 20, 2023

Lang Qin, Yao Zhang, Hongru Liang, Jun Wang, Zhenglu Yang

Figure 1 for Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue

Figure 2 for Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue

Figure 3 for Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue

Figure 4 for Well Begun is Half Done: Generator-agnostic Knowledge Pre-Selection for Knowledge-Grounded Dialogue

Abstract:Accurate knowledge selection is critical in knowledge-grounded dialogue systems. Towards a closer look at it, we offer a novel perspective to organize existing literature, i.e., knowledge selection coupled with, after, and before generation. We focus on the third under-explored category of study, which can not only select knowledge accurately in advance, but has the advantage to reduce the learning, adjustment, and interpretation burden of subsequent response generation models, especially LLMs. We propose GATE, a generator-agnostic knowledge selection method, to prepare knowledge for subsequent response generation models by selecting context-related knowledge among different knowledge structures and variable knowledge requirements. Experimental results demonstrate the superiority of GATE, and indicate that knowledge selection before generation is a lightweight yet effective way to facilitate LLMs (e.g., ChatGPT) to generate more informative responses.

* Accepted by EMNLP2023 main conference

Via

Access Paper or Ask Questions

FusionU-Net: U-Net with Enhanced Skip Connection for Pathology Image Segmentation

Oct 17, 2023

Zongyi Li, Hongbing Lyu, Jun Wang

Figure 1 for FusionU-Net: U-Net with Enhanced Skip Connection for Pathology Image Segmentation

Figure 2 for FusionU-Net: U-Net with Enhanced Skip Connection for Pathology Image Segmentation

Figure 3 for FusionU-Net: U-Net with Enhanced Skip Connection for Pathology Image Segmentation

Figure 4 for FusionU-Net: U-Net with Enhanced Skip Connection for Pathology Image Segmentation

Abstract:In recent years, U-Net and its variants have been widely used in pathology image segmentation tasks. One of the key designs of U-Net is the use of skip connections between the encoder and decoder, which helps to recover detailed information after upsampling. While most variations of U-Net adopt the original skip connection design, there is semantic gap between the encoder and decoder that can negatively impact model performance. Therefore, it is important to reduce this semantic gap before conducting skip connection. To address this issue, we propose a new segmentation network called FusionU-Net, which is based on U-Net structure and incorporates a fusion module to exchange information between different skip connections to reduce semantic gaps. Unlike the other fusion modules in existing networks, ours is based on a two-round fusion design that fully considers the local relevance between adjacent encoder layer outputs and the need for bi-directional information exchange across multiple layers. We conducted extensive experiments on multiple pathology image datasets to evaluate our model and found that FusionU-Net achieves better performance compared to other competing methods. We argue our fusion module is more effective than the designs of existing networks, and it could be easily embedded into other networks to further enhance the model performance.

* 9 pages, 4 figures and 4 tables

Via

Access Paper or Ask Questions

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent Advances

Oct 11, 2023

Zihan Zhang, Meng Fang, Ling Chen, Mohammad-Reza Namazi-Rad, Jun Wang

Abstract:Although large language models (LLMs) are impressive in solving various tasks, they can quickly be outdated after deployment. Maintaining their up-to-date status is a pressing concern in the current era. This paper provides a comprehensive review of recent advances in aligning LLMs with the ever-changing world knowledge without re-training from scratch. We categorize research works systemically and provide in-depth comparisons and discussion. We also discuss existing challenges and highlight future directions to facilitate research in this field. We release the paper list at https://github.com/hyintell/awesome-refreshing-llms

* EMNLP 2023 main conference, paper link at https://github.com/hyintell/awesome-refreshing-llms

Via

Access Paper or Ask Questions

GEAR: A GPU-Centric Experience Replay System for Large Reinforcement Learning Models

Oct 08, 2023

Hanjing Wang, Man-Kit Sit, Congjie He, Ying Wen, Weinan Zhang, Jun Wang, Yaodong Yang, Luo Mai

Abstract:This paper introduces a distributed, GPU-centric experience replay system, GEAR, designed to perform scalable reinforcement learning (RL) with large sequence models (such as transformers). With such models, existing systems such as Reverb face considerable bottlenecks in memory, computation, and communication. GEAR, however, optimizes memory efficiency by enabling the memory resources on GPU servers (including host memory and device memory) to manage trajectory data. Furthermore, it facilitates decentralized GPU devices to expedite various trajectory selection strategies, circumventing computational bottlenecks. GEAR is equipped with GPU kernels capable of collecting trajectories using zero-copy access to host memory, along with remote-directed-memory access over InfiniBand, improving communication efficiency. Cluster experiments have shown that GEAR can achieve performance levels up to 6x greater than Reverb when training state-of-the-art large RL models. GEAR is open-sourced at https://github.com/bigrl-team/gear.

* ICML2023

Via

Access Paper or Ask Questions