Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yang Hu

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Oct 07, 2024

Junming Wang, Xingyu Zhang, Zebin Xing, Songen Gu, Xiaoyang Guo, Yang Hu, Ziying Song, Qian Zhang, Xiaoxiao Long, Wei Yin

Abstract:In this paper, we propose HE-Drive: the first human-like-centric end-to-end autonomous driving system to generate trajectories that are both temporally consistent and comfortable. Recent studies have shown that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select accuracy trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the dilemma of generating temporally inconsistent and uncomfortable trajectories. To solve the above problems, Our HE-Drive first extracts key 3D spatial representations through sparse perception, which then serves as conditional inputs for a Conditional Denoising Diffusion Probabilistic Models (DDPMs)-based motion planner to generate temporal consistency multi-modal trajectories. A Vision-Language Models (VLMs)-guided trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle, ensuring human-like end-to-end driving. Experiments show that HE-Drive not only achieves state-of-the-art performance (i.e., reduces the average collision rate by 71% than VAD) and efficiency (i.e., 1.9X faster than SparseDrive) on the challenging nuScenes and OpenScene datasets but also provides the most comfortable driving experience on real-world data.For more information, visit the project website: https://jmwang0117.github.io/HE-Drive/.

Via

Access Paper or Ask Questions

Aerial Grasping with Soft Aerial Vehicle Using Disturbance Observer-Based Model Predictive Control

Sep 21, 2024

Hiu Ching Cheung, Bailun Jiang, Yang Hu, Henry K. Chu, Chih-Yung Wen, Ching-Wei Chang

Figure 1 for Aerial Grasping with Soft Aerial Vehicle Using Disturbance Observer-Based Model Predictive Control

Figure 2 for Aerial Grasping with Soft Aerial Vehicle Using Disturbance Observer-Based Model Predictive Control

Figure 3 for Aerial Grasping with Soft Aerial Vehicle Using Disturbance Observer-Based Model Predictive Control

Figure 4 for Aerial Grasping with Soft Aerial Vehicle Using Disturbance Observer-Based Model Predictive Control

Abstract:Aerial grasping, particularly soft aerial grasping, holds significant promise for drone delivery and harvesting tasks. However, controlling UAV dynamics during aerial grasping presents considerable challenges. The increased mass during payload grasping adversely affects thrust prediction, while unpredictable environmental disturbances further complicate control efforts. In this study, our objective aims to enhance the control of the Soft Aerial Vehicle (SAV) during aerial grasping by incorporating a disturbance observer into a Nonlinear Model Predictive Control (NMPC) SAV controller. By integrating the disturbance observer into the NMPC SAV controller, we aim to compensate for dynamic model idealization and uncertainties arising from additional payloads and unpredictable disturbances. Our approach combines a disturbance observer-based NMPC with the SAV controller, effectively minimizing tracking errors and enabling precise aerial grasping along all three axes. The proposed SAV equipped with Disturbance Observer-based Nonlinear Model Predictive Control (DOMPC) demonstrates remarkable capabilities in handling both static and non-static payloads, leading to the successful grasping of various objects. Notably, our SAV achieves an impressive payload-to-weight ratio, surpassing previous investigations in the domain of soft grasping. Using the proposed soft aerial vehicle weighing 1.002 kg, we achieve a maximum payload of 337 g by grasping.

* 8 pages, 10 figures, submitted to IEEE Robotics Automation Letters

Via

Access Paper or Ask Questions

Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference

Jul 19, 2024

Qian Liang, Yan Chen, Yang Hu

Figure 1 for Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference

Figure 2 for Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference

Figure 3 for Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference

Figure 4 for Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference

Abstract:Remote photoplethysmography (rPPG) has gained significant attention in recent years for its ability to extract physiological signals from facial videos. While existing rPPG measurement methods have shown satisfactory performance in intra-dataset and cross-dataset scenarios, they often overlook the incremental learning scenario, where training data is presented sequentially, resulting in the issue of catastrophic forgetting. Meanwhile, most existing class incremental learning approaches are unsuitable for rPPG measurement. In this paper, we present a novel method named ADDP to tackle continual learning for rPPG measurement. We first employ adapter to efficiently finetune the model on new tasks. Then we design domain prototypes that are more applicable to rPPG signal regression than commonly used class prototypes. Based on these prototypes, we propose a feature augmentation strategy to consolidate the past knowledge and an inference simplification strategy to convert potentially forgotten tasks into familiar ones for the model. To evaluate ADDP and enable fair comparisons, we create the first continual learning protocol for rPPG measurement. Comprehensive experiments demonstrate the effectiveness of our method for rPPG continual learning. Source code is available at \url{https://github.com/MayYoY/rPPGDIL}

* ECCV 2024

Via

Access Paper or Ask Questions

In-depth analysis of recall initiators of medical devices with a Machine Learning-Natural language Processing workflow

Jun 14, 2024

Yang Hu

Figure 1 for In-depth analysis of recall initiators of medical devices with a Machine Learning-Natural language Processing workflow

Figure 2 for In-depth analysis of recall initiators of medical devices with a Machine Learning-Natural language Processing workflow

Figure 3 for In-depth analysis of recall initiators of medical devices with a Machine Learning-Natural language Processing workflow

Figure 4 for In-depth analysis of recall initiators of medical devices with a Machine Learning-Natural language Processing workflow

Abstract:Recall initiator identification and assessment are the preliminary steps to prevent medical device recall. Conventional analysis tools are inappropriate for processing massive and multi-formatted data comprehensively and completely to meet the higher expectations of delicacy management with the increasing overall data volume and textual data format. This study presents a bigdata-analytics-based machine learning-natural language processing work tool to address the shortcomings in dealing efficiency and data process versatility of conventional tools in the practical context of big data volume and muti data format. This study identified, assessed and analysed the medical device recall initiators according to the public medical device recall database from 2018 to 2024 with the ML-NLP tool. The results suggest that the unsupervised Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm can present each single recall initiator in a specific manner, therefore helping practitioners to identify the recall reasons comprehensively and completely within a short time frame. This is then followed by text similarity-based textual classification to assist practitioners in controlling the group size of recall initiators and provide managerial insights from the operational to the tactical and strategical levels. This ML-NLP work tool can not only capture specific details of each recall initiator but also interpret the inner connection of each existing initiator and can be implemented for risk identification and assessment in the forward SC. Finally, this paper suggests some concluding remarks and presents future works. More proactive practices and control solutions for medical device recalls are expected in the future.

* The Second version of the manuscript

Via

Access Paper or Ask Questions

IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals

May 06, 2024

Yadong Li, Dongheng Zhang, Ruixu Geng, Jincheng Wu, Yang Hu, Qibin Sun, Yan Chen

Figure 1 for IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals

Figure 2 for IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals

Figure 3 for IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals

Figure 4 for IFNet: Deep Imaging and Focusing for Handheld SAR with Millimeter-wave Signals

Abstract:Recent advancements have showcased the potential of handheld millimeter-wave (mmWave) imaging, which applies synthetic aperture radar (SAR) principles in portable settings. However, existing studies addressing handheld motion errors either rely on costly tracking devices or employ simplified imaging models, leading to impractical deployment or limited performance. In this paper, we present IFNet, a novel deep unfolding network that combines the strengths of signal processing models and deep neural networks to achieve robust imaging and focusing for handheld mmWave systems. We first formulate the handheld imaging model by integrating multiple priors about mmWave images and handheld phase errors. Furthermore, we transform the optimization processes into an iterative network structure for improved and efficient imaging performance. Extensive experiments demonstrate that IFNet effectively compensates for handheld phase errors and recovers high-fidelity images from severely distorted signals. In comparison with existing methods, IFNet can achieve at least 11.89 dB improvement in average peak signal-to-noise ratio (PSNR) and 64.91% improvement in average structural similarity index measure (SSIM) on a real-world dataset.

Via

Access Paper or Ask Questions

Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage

Apr 13, 2024

Yang Hu, Jinxia Zhang, Kaihua Zhang, Yin Yuan

Figure 1 for Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage

Figure 2 for Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage

Figure 3 for Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage

Figure 4 for Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage

Abstract:Efficient and accurate camouflaged object detection (COD) poses a challenge in the field of computer vision. Recent approaches explored the utility of edge information for network co-supervision, achieving notable advancements. However, these approaches introduce an extra branch for complex edge extraction, complicate the model architecture and increases computational demands. Addressing this issue, our work replicates the effect that animal's camouflage can be easily revealed under a shifting spotlight, and leverages it for network co-supervision to form a compact yet efficient single-branch network, the Co-Supervised Spotlight Shifting Network (CS$^3$Net). The spotlight shifting strategy allows CS$^3$Net to learn additional prior within a single-branch framework, obviating the need for resource demanding multi-branch design. To leverage the prior of spotlight shifting co-supervision, we propose Shadow Refinement Module (SRM) and Projection Aware Attention (PAA) for feature refinement and enhancement. To ensure the continuity of multi-scale features aggregation, we utilize the Extended Neighbor Connection Decoder (ENCD) for generating the final predictions. Empirical evaluations on public datasets confirm that our CS$^3$Net offers an optimal balance between efficiency and performance: it accomplishes a 32.13% reduction in Multiply-Accumulate (MACs) operations compared to leading efficient COD models, while also delivering superior performance.

Via

Access Paper or Ask Questions

Efficient Duple Perturbation Robustness in Low-rank MDPs

Apr 11, 2024

Yang Hu, Haitong Ma, Bo Dai, Na Li

Figure 1 for Efficient Duple Perturbation Robustness in Low-rank MDPs

Figure 2 for Efficient Duple Perturbation Robustness in Low-rank MDPs

Figure 3 for Efficient Duple Perturbation Robustness in Low-rank MDPs

Figure 4 for Efficient Duple Perturbation Robustness in Low-rank MDPs

Abstract:The pursuit of robustness has recently been a popular topic in reinforcement learning (RL) research, yet the existing methods generally suffer from efficiency issues that obstruct their real-world implementation. In this paper, we introduce duple perturbation robustness, i.e. perturbation on both the feature and factor vectors for low-rank Markov decision processes (MDPs), via a novel characterization of $(\xi,\eta)$-ambiguity sets. The novel robust MDP formulation is compatible with the function representation view, and therefore, is naturally applicable to practical RL problems with large or even continuous state-action spaces. Meanwhile, it also gives rise to a provably efficient and practical algorithm with theoretical convergence rate guarantee. Examples are designed to justify the new robustness concept, and algorithmic efficiency is supported by both theoretical bounds and numerical simulations.

* 25 pages, 8 figures, in submission to ICML'24

Via

Access Paper or Ask Questions

Leveraging Intelligent Recommender system as a first step resilience measure -- A data-driven supply chain disruption response framework

Mar 30, 2024

Yang Hu

Abstract:Interests in the value of digital technologies for its potential uses to increase supply chain resilience (SCRes) are increasing in light to the industry 4.0 and the global pandemic. Utilization of Recommender systems (RS) as a supply chain (SC) resilience measure is neglected although RS is a capable tool to enhance SC resilience from a reactive aspect. To address this problem, this research proposed a novel data-driven supply chain disruption response framework based on the intelligent recommender system techniques and validated the conceptual model through a practical use case. Results show that our framework can be implemented as an effective SC disruption mitigation measure in the very first response phrase and help SC participants get better reaction performance after the SC disruption.

Via

Access Paper or Ask Questions

Emulating Complex Synapses Using Interlinked Proton Conductors

Jan 26, 2024

Lifu Zhang, Ji-An Li, Yang Hu, Jie Jiang, Rongjie Lai, Marcus K. Benna, Jian Shi

Abstract:In terms of energy efficiency and computational speed, neuromorphic electronics based on non-volatile memory devices is expected to be one of most promising hardware candidates for future artificial intelligence (AI). However, catastrophic forgetting, networks rapidly overwriting previously learned weights when learning new tasks, remains as a pivotal hurdle in either digital or analog AI chips for unleashing the true power of brain-like computing. To address catastrophic forgetting in the context of online memory storage, a complex synapse model (the Benna-Fusi model) has been proposed recently[1], whose synaptic weight and internal variables evolve following a diffusion dynamics. In this work, by designing a proton transistor with a series of charge-diffusion-controlled storage components, we have experimentally realized the Benna-Fusi artificial complex synapse. The memory consolidation from coupled storage components is revealed by both numerical simulations and experimental observations. Different memory timescales for the complex synapse are engineered by the diffusion length of charge carriers, the capacity and number of coupled storage components. The advantage of the demonstrated complex synapse in both memory capacity and memory consolidation is revealed by neural network simulations of face familiarity detection. Our experimental realization of the complex synapse suggests a promising approach to enhance memory capacity and to enable continual learning.

* 6 figures

Via

Access Paper or Ask Questions

Passive Non-Line-of-Sight Imaging with Light Transport Modulation

Dec 26, 2023

Jiarui Zhang, Ruixu Geng, Xiaolong Du, Yan Chen, Houqiang Li, Yang Hu

Abstract:Passive non-line-of-sight (NLOS) imaging has witnessed rapid development in recent years, due to its ability to image objects that are out of sight. The light transport condition plays an important role in this task since changing the conditions will lead to different imaging models. Existing learning-based NLOS methods usually train independent models for different light transport conditions, which is computationally inefficient and impairs the practicality of the models. In this work, we propose NLOS-LTM, a novel passive NLOS imaging method that effectively handles multiple light transport conditions with a single network. We achieve this by inferring a latent light transport representation from the projection image and using this representation to modulate the network that reconstructs the hidden image from the projection image. We train a light transport encoder together with a vector quantizer to obtain the light transport representation. To further regulate this representation, we jointly learn both the reconstruction network and the reprojection network during training. A set of light transport modulation blocks is used to modulate the two jointly trained networks in a multi-scale way. Extensive experiments on a large-scale passive NLOS dataset demonstrate the superiority of the proposed method. The code is available at https://github.com/JerryOctopus/NLOS-LTM.

Via

Access Paper or Ask Questions