Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yue Hu

Artificial Intelligence Lab, Department of Computer Systems Engineering, University of Engineering and Applied Sciences

Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

Apr 15, 2024

Genjia Liu, Yue Hu, Chenxin Xu, Weibo Mao, Junhao Ge, Zhengxiang Huang, Yifan Lu, Yinda Xu, Junkai Xia, Yafei Wang(+1 more)

Figure 1 for Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

Figure 2 for Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

Figure 3 for Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

Figure 4 for Towards Collaborative Autonomous Driving: Simulation Platform and End-to-End System

Abstract:Vehicle-to-everything-aided autonomous driving (V2X-AD) has a huge potential to provide a safer driving solution. Despite extensive researches in transportation and communication to support V2X-AD, the actual utilization of these infrastructures and communication resources in enhancing driving performances remains largely unexplored. This highlights the necessity of collaborative autonomous driving: a machine learning approach that optimizes the information sharing strategy to improve the driving performance of each vehicle. This effort necessitates two key foundations: a platform capable of generating data to facilitate the training and testing of V2X-AD, and a comprehensive system that integrates full driving-related functionalities with mechanisms for information sharing. From the platform perspective, we present V2Xverse, a comprehensive simulation platform for collaborative autonomous driving. This platform provides a complete pipeline for collaborative driving. From the system perspective, we introduce CoDriving, a novel end-to-end collaborative driving system that properly integrates V2X communication over the entire autonomous pipeline, promoting driving with shared perceptual information. The core idea is a novel driving-oriented communication strategy. Leveraging this strategy, CoDriving improves driving performance while optimizing communication efficiency. We make comprehensive benchmarks with V2Xverse, analyzing both modular performance and closed-loop driving performance. Experimental results show that CoDriving: i) significantly improves the driving score by 62.49% and drastically reduces the pedestrian collision rate by 53.50% compared to the SOTA end-to-end driving method, and ii) achieves sustaining driving performance superiority over dynamic constraint communication conditions.

Via

Access Paper or Ask Questions

GCAM: Gaussian and causal-attention model of food fine-grained recognition

Mar 18, 2024

Guohang Zhuang, Yue Hu, Tianxing Yan, JiaZhan Gao

Figure 1 for GCAM: Gaussian and causal-attention model of food fine-grained recognition

Figure 2 for GCAM: Gaussian and causal-attention model of food fine-grained recognition

Figure 3 for GCAM: Gaussian and causal-attention model of food fine-grained recognition

Figure 4 for GCAM: Gaussian and causal-attention model of food fine-grained recognition

Abstract:Currently, most food recognition relies on deep learning for category classification. However, these approaches struggle to effectively distinguish between visually similar food samples, highlighting the pressing need to address fine-grained issues in food recognition. To mitigate these challenges, we propose the adoption of a Gaussian and causal-attention model for fine-grained object recognition.In particular, we train to obtain Gaussian features over target regions, followed by the extraction of fine-grained features from the objects, thereby enhancing the feature mapping capabilities of the target regions. To counteract data drift resulting from uneven data distributions, we employ a counterfactual reasoning approach. By using counterfactual interventions, we analyze the impact of the learned image attention mechanism on network predictions, enabling the network to acquire more useful attention weights for fine-grained image recognition. Finally, we design a learnable loss strategy to balance training stability across various modules, ultimately improving the accuracy of the final target recognition. We validate our approach on four relevant datasets, demonstrating its excellent performance across these four datasets.We experimentally show that GCAM surpasses state-of-the-art methods on the ETH-FOOD101, UECFOOD256, and Vireo-FOOD172 datasets. Furthermore, our approach also achieves state-of-the-art performance on the CUB-200 dataset.

* 23 pages, 11 figures

Via

Access Paper or Ask Questions

An Extensible Framework for Open Heterogeneous Collaborative Perception

Jan 25, 2024

Yifan Lu, Yue Hu, Yiqi Zhong, Dequan Wang, Siheng Chen, Yanfeng Wang

Figure 1 for An Extensible Framework for Open Heterogeneous Collaborative Perception

Figure 2 for An Extensible Framework for Open Heterogeneous Collaborative Perception

Figure 3 for An Extensible Framework for Open Heterogeneous Collaborative Perception

Figure 4 for An Extensible Framework for Open Heterogeneous Collaborative Perception

Abstract:Collaborative perception aims to mitigate the limitations of single-agent perception, such as occlusions, by facilitating data exchange among multiple agents. However, most current works consider a homogeneous scenario where all agents use identity sensors and perception models. In reality, heterogeneous agent types may continually emerge and inevitably face a domain gap when collaborating with existing agents. In this paper, we introduce a new open heterogeneous problem: how to accommodate continually emerging new heterogeneous agent types into collaborative perception, while ensuring high perception performance and low integration cost? To address this problem, we propose HEterogeneous ALliance (HEAL), a novel extensible collaborative perception framework. HEAL first establishes a unified feature space with initial agents via a novel multi-scale foreground-aware Pyramid Fusion network. When heterogeneous new agents emerge with previously unseen modalities or models, we align them to the established unified space with an innovative backward alignment. This step only involves individual training on the new agent type, thus presenting extremely low training costs and high extensibility. It also protects new agents' model details from disclosure since the training can be conducted by the agent owner locally. To enrich agents' data heterogeneity, we bring OPV2V-H, a new large-scale dataset with more diverse sensor types. Extensive experiments on OPV2V-H and DAIR-V2X datasets show that HEAL surpasses SOTA methods in performance while reducing the training parameters by 91.5% when integrating 3 new agent types. Code and data are available at: https://github.com/yifanlu0227/HEAL.

* Accepted by ICLR 2024. The code and data are open-sourced at https://github.com/yifanlu0227/HEAL

Via

Access Paper or Ask Questions

Pragmatic Communication in Multi-Agent Collaborative Perception

Jan 23, 2024

Yue Hu, Xianghe Pang, Xiaoqi Qin, Yonina C. Eldar, Siheng Chen, Ping Zhang, Wenjun Zhang

Abstract:Collaborative perception allows each agent to enhance its perceptual abilities by exchanging messages with others. It inherently results in a trade-off between perception ability and communication costs. Previous works transmit complete full-frame high-dimensional feature maps among agents, resulting in substantial communication costs. To promote communication efficiency, we propose only transmitting the information needed for the collaborator's downstream task. This pragmatic communication strategy focuses on three key aspects: i) pragmatic message selection, which selects task-critical parts from the complete data, resulting in spatially and temporally sparse feature vectors; ii) pragmatic message representation, which achieves pragmatic approximation of high-dimensional feature vectors with a task-adaptive dictionary, enabling communicating with integer indices; iii) pragmatic collaborator selection, which identifies beneficial collaborators, pruning unnecessary communication links. Following this strategy, we first formulate a mathematical optimization framework for the perception-communication trade-off and then propose PragComm, a multi-agent collaborative perception system with two key components: i) single-agent detection and tracking and ii) pragmatic collaboration. The proposed PragComm promotes pragmatic communication and adapts to a wide range of communication conditions. We evaluate PragComm for both collaborative 3D object detection and tracking tasks in both real-world, V2V4Real, and simulation datasets, OPV2V and V2X-SIM2.0. PragComm consistently outperforms previous methods with more than 32.7K times lower communication volume on OPV2V. Code is available at github.com/PhyllisH/PragComm.

* 18 pages

Via

Access Paper or Ask Questions

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Dec 13, 2023

Jinta Weng, Jiarui Zhang, Yue Hu, Daidong Fa, Xiaofeng Xuand, Heyan Huang

Figure 1 for Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Figure 2 for Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Figure 3 for Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Figure 4 for Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

Abstract:Large language models (LLMs) can be used as accessible and intelligent chatbots by constructing natural language queries and directly inputting the prompt into the large language model. However, different prompt' constructions often lead to uncertainty in the answers and thus make it hard to utilize the specific knowledge of LLMs (like ChatGPT). To alleviate this, we use an interpretable structure to explain the prompt learning principle in LLMs, which certificates that the effectiveness of language models is determined by position changes of the task's related tokens. Therefore, we propose MTPrompt, a multi-dimensional task prompt learning method consisting based on task-related object, summary, and task description information. By automatically building and searching for appropriate prompts, our proposed MTPrompt achieves the best results on few-shot samples setting and five different datasets. In addition, we demonstrate the effectiveness and stability of our method in different experimental settings and ablation experiments. In interaction with large language models, embedding more task-related information into prompts will make it easier to stimulate knowledge embedded in large language models.

* arXiv admin note: text overlap with arXiv:2210.16489

Via

Access Paper or Ask Questions

On the Feasibility of Fingerprinting Collaborative Robot Traffic

Dec 11, 2023

Cheng Tang, Diogo Barradas, Urs Hengartner, Yue Hu

Abstract:This study examines privacy risks in collaborative robotics, focusing on the potential for traffic analysis in encrypted robot communications. While previous research has explored low-level command recovery, our work investigates high-level motion recovery from command message sequences. We evaluate the efficacy of traditional website fingerprinting techniques (k-FP, KNN, and CUMUL) and their limitations in accurately identifying robotic actions due to their inability to capture detailed temporal relationships. To address this, we introduce a traffic classification approach using signal processing techniques, demonstrating high accuracy in action identification and highlighting the vulnerability of encrypted communications to privacy breaches. Additionally, we explore defenses such as packet padding and timing manipulation, revealing the challenges in balancing traffic analysis resistance with network efficiency. Our findings emphasize the need for continued development of practical defenses in robotic privacy and security.

* 12 pages

Via

Access Paper or Ask Questions

The Impact of Robots' Facial Emotional Expressions on Light Physical Exercises

Dec 04, 2023

Nourhan Abdulazeem, Yue Hu

Abstract:To address the global challenge of population aging, our goal is to enhance successful aging through the introduction of robots capable of assisting in daily physical activities and promoting light exercises, which would enhance the cognitive and physical well-being of older adults. Previous studies have shown that facial expressions can increase engagement when interacting with robots. This study aims to investigate how older adults perceive and interact with a robot capable of displaying facial emotions while performing a physical exercise task together. We employed a collaborative robotic arm with a flat panel screen to encourage physical exercise across three different facial emotion conditions. We ran the experiment with older adults aged between 66 and 88. Our findings suggest that individuals perceive robots exhibiting facial expressions as less competent than those without such expressions. Additionally, the presence of facial expressions does not appear to significantly impact participants' levels of engagement, unlike other state-of-the-art studies. This observation is likely linked to our study's emphasis on collaborative physical human-robot interaction (pHRI) applications, as opposed to socially oriented pHRI applications. Additionally, we foresee a requirement for more suitable non-verbal social behavior to effectively enhance participants' engagement levels.

Via

Access Paper or Ask Questions

Augmented Kinesthetic Teaching: Enhancing Task Execution Efficiency through Intuitive Human Instructions

Dec 01, 2023

Cheng Tang, Jiaming Zhong, Yue Hu

Abstract:In this paper, we present a complete and efficient implementation of a knowledge-sharing augmented kinesthetic teaching approach for efficient task execution in robotics. Our augmented kinesthetic teaching method integrates intuitive human feedback, including verbal, gesture, gaze, and physical guidance, to facilitate the extraction of multiple layers of task information including control type, attention direction, input and output type, action state change trigger, etc., enhancing the adaptability and autonomy of robots during task execution. We propose an efficient Programming by Demonstration (PbD) framework for users with limited technical experience to teach the robot in an intuitive manner. The proposed framework provides an interface for such users to teach customized tasks using high-level commands, with the goal of achieving a smoother teaching experience and task execution. This is demonstrated with the sample task of pouring water.

Via

Access Paper or Ask Questions

RoboSync: OS for Social Robots with Customizable Behaviour

Dec 01, 2023

Cheng Tang, Yijing Feng, Yue Hu

Abstract:Traditional robotic systems require complex implementations that are not always accessible or easy to use for Human-Robot Interaction (HRI) application developers. With the aim of simplifying the implementation of HRI applications, this paper introduces a novel real-time operating system (RTOS) designed for customizable HRI - RoboSync. By creating multi-level abstraction layers, the system enables users to define complex emotional and behavioral models without needing deep technical expertise. The system's modular architecture comprises a behavior modeling layer, a machine learning plugin configuration layer, a sensor checks customization layer, a scheduler that fits the need of HRI, and a communication and synchronization layer. This approach not only promotes ease of use without highly specialized skills but also ensures real-time responsiveness and adaptability. The primary functionality of the RTOS has been implemented for proof of concept and was tested on a CortexM4 microcontroller, demonstrating its potential for a wide range of lightweight simple-to-implement social robotics applications.

Via

Access Paper or Ask Questions

DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

Nov 30, 2023

Ting Liu, Yue Hu, Wansen Wu, Youkai Wang, Kai Xu, Quanjun Yin

Figure 1 for DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

Figure 2 for DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

Figure 3 for DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

Figure 4 for DAP: Domain-aware Prompt Learning for Vision-and-Language Navigation

Abstract:Following language instructions to navigate in unseen environments is a challenging task for autonomous embodied agents. With strong representation capabilities, pretrained vision-and-language models are widely used in VLN. However, most of them are trained on web-crawled general-purpose datasets, which incurs a considerable domain gap when used for VLN tasks. To address the problem, we propose a novel and model-agnostic domain-aware prompt learning (DAP) framework. For equipping the pretrained models with specific object-level and scene-level cross-modal alignment in VLN tasks, DAP applies a low-cost prompt tuning paradigm to learn soft visual prompts for extracting in-domain image semantics. Specifically, we first generate a set of in-domain image-text pairs with the help of the CLIP model. Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model. DAP injects in-domain visual knowledge into the visual encoder of the pretrained model in an efficient way. Experimental results on both R2R and REVERIE show the superiority of DAP compared to existing state-of-the-art methods.

* 4 pages. arXiv admin note: substantial text overlap with arXiv:2309.03661

Via

Access Paper or Ask Questions