Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kai Huang

A Real-Time System for Scheduling and Managing UAV Delivery in Urban

Dec 16, 2024

Han Liu, Tian Liu, Kai Huang

Figure 1 for A Real-Time System for Scheduling and Managing UAV Delivery in Urban

Figure 2 for A Real-Time System for Scheduling and Managing UAV Delivery in Urban

Figure 3 for A Real-Time System for Scheduling and Managing UAV Delivery in Urban

Figure 4 for A Real-Time System for Scheduling and Managing UAV Delivery in Urban

Abstract:As urban logistics demand continues to grow, UAV delivery has become a key solution to improve delivery efficiency, reduce traffic congestion, and lower logistics costs. However, to fully leverage the potential of UAV delivery networks, efficient swarm scheduling and management are crucial. In this paper, we propose a real-time scheduling and management system based on the ``Airport-Unloading Station" model, aiming to bridge the gap between high-level scheduling algorithms and low-level execution systems. This system, acting as middleware, accurately translates the requirements from the scheduling layer into specific execution instructions, ensuring that the scheduling algorithms perform effectively in real-world environments. Additionally, we implement three collaborative scheduling schemes involving autonomous ground vehicles (AGVs), unmanned aerial vehicles (UAVs), and ground staff to further optimize overall delivery efficiency. Through extensive experiments, this study demonstrates the rationality and feasibility of the proposed management system, providing practical solution for the commercial application of UAVs delivery in urban. Code: https://github.com/chengji253/UAVDeliverySystem

Via

Access Paper or Ask Questions

Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences

Oct 28, 2024

Zhihao Zhao, Junjie Yang, Shahrooz Faghihroohi, Yinzheng Zhao, Daniel Zapp, Kai Huang, Nassir Navab, M. Ali Nasseri

Figure 1 for Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences

Figure 2 for Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences

Figure 3 for Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences

Figure 4 for Extrapolating Prospective Glaucoma Fundus Images through Diffusion Model in Irregular Longitudinal Sequences

Abstract:The utilization of longitudinal datasets for glaucoma progression prediction offers a compelling approach to support early therapeutic interventions. Predominant methodologies in this domain have primarily focused on the direct prediction of glaucoma stage labels from longitudinal datasets. However, such methods may not adequately encapsulate the nuanced developmental trajectory of the disease. To enhance the diagnostic acumen of medical practitioners, we propose a novel diffusion-based model to predict prospective images by extrapolating from existing longitudinal fundus images of patients. The methodology delineated in this study distinctively leverages sequences of images as inputs. Subsequently, a time-aligned mask is employed to select a specific year for image generation. During the training phase, the time-aligned mask resolves the issue of irregular temporal intervals in longitudinal image sequence sampling. Additionally, we utilize a strategy of randomly masking a frame in the sequence to establish the ground truth. This methodology aids the network in continuously acquiring knowledge regarding the internal relationships among the sequences throughout the learning phase. Moreover, the introduction of textual labels is instrumental in categorizing images generated within the sequence. The empirical findings from the conducted experiments indicate that our proposed model not only effectively generates longitudinal data but also significantly improves the precision of downstream classification tasks.

* Accepted at BIBM 2024

Via

Access Paper or Ask Questions

KaLDeX: Kalman Filter based Linear Deformable Cross Attention for Retina Vessel Segmentation

Oct 28, 2024

Zhihao Zhao, Shahrooz Faghihroohi, Yinzheng Zhao, Junjie Yang, Shipeng Zhong, Kai Huang, Nassir Navab, Boyang Li, M. Ali Nasseri

Abstract:Background and Objective: In the realm of ophthalmic imaging, accurate vascular segmentation is paramount for diagnosing and managing various eye diseases. Contemporary deep learning-based vascular segmentation models rival human accuracy but still face substantial challenges in accurately segmenting minuscule blood vessels in neural network applications. Due to the necessity of multiple downsampling operations in the CNN models, fine details from high-resolution images are inevitably lost. The objective of this study is to design a structure to capture the delicate and small blood vessels. Methods: To address these issues, we propose a novel network (KaLDeX) for vascular segmentation leveraging a Kalman filter based linear deformable cross attention (LDCA) module, integrated within a UNet++ framework. Our approach is based on two key components: Kalman filter (KF) based linear deformable convolution (LD) and cross-attention (CA) modules. The LD module is designed to adaptively adjust the focus on thin vessels that might be overlooked in standard convolution. The CA module improves the global understanding of vascular structures by aggregating the detailed features from the LD module with the high level features from the UNet++ architecture. Finally, we adopt a topological loss function based on persistent homology to constrain the topological continuity of the segmentation. Results: The proposed method is evaluated on retinal fundus image datasets (DRIVE, CHASE_BD1, and STARE) as well as the 3mm and 6mm of the OCTA-500 dataset, achieving an average accuracy (ACC) of 97.25%, 97.77%, 97.85%, 98.89%, and 98.21%, respectively. Conclusions: Empirical evidence shows that our method outperforms the current best models on different vessel segmentation datasets. Our source code is available at: https://github.com/AIEyeSystem/KalDeX.

Via

Access Paper or Ask Questions

Flow-Inspired Lightweight Multi-Robot Real-Time Scheduling Planner

Sep 11, 2024

Han Liu, Yu Jin, Tianjiang Hu, Kai Huang

Abstract:Collision avoidance and trajectory planning are crucial in multi-robot systems, particularly in environments with numerous obstacles. Although extensive research has been conducted in this field, the challenge of rapid traversal through such environments has not been fully addressed. This paper addresses this problem by proposing a novel real-time scheduling scheme designed to optimize the passage of multi-robot systems through complex, obstacle-rich maps. Inspired from network flow optimization, our scheme decomposes the environment into a network structure, enabling the efficient allocation of robots to paths based on real-time congestion data. The proposed scheduling planner operates on top of existing collision avoidance algorithms, focusing on minimizing traversal time by balancing robot detours and waiting times. Our simulation results demonstrate the efficiency of the proposed scheme. Additionally, we validated its effectiveness through real world flight tests using ten quadrotors. This work contributes a lightweight, effective scheduling planner capable of meeting the real-time demands of multi-robot systems in obstacle-rich environments.

Via

Access Paper or Ask Questions

4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Sep 03, 2024

Daosong Hu, Ruomeng Wang, Liang Zhao, Mingyue Cui, Song Ding, Kai Huang

Figure 1 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Figure 2 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Figure 3 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Figure 4 for 4D-CAT: Synthesis of 4D Coronary Artery Trees from Systole and Diastole

Abstract:The three-dimensional vascular model reconstructed from CT images is widely used in medical diagnosis. At different phases, the beating of the heart can cause deformation of vessels, resulting in different vascular imaging states and false positive diagnostic results. The 4D model can simulate a complete cardiac cycle. Due to the dose limitation of contrast agent injection in patients, it is valuable to synthesize a 4D coronary artery trees through finite phases imaging. In this paper, we propose a method for generating a 4D coronary artery trees, which maps the systole to the diastole through deformation field prediction, interpolates on the timeline, and the motion trajectory of points are obtained. Specifically, the centerline is used to represent vessels and to infer deformation fields using cube-based sorting and neural networks. Adjacent vessel points are aggregated and interpolated based on the deformation field of the centerline point to obtain displacement vectors of different phases. Finally, the proposed method is validated through experiments to achieve the registration of non-rigid vascular points and the generation of 4D coronary trees.

Via

Access Paper or Ask Questions

Meta-Learning Empowered Graph Neural Networks for Radio Resource Management

Aug 29, 2024

Kai Huang, Le Liang, Xinping Yi, Hao Ye, Shi Jin, Geoffrey Ye Li

Figure 1 for Meta-Learning Empowered Graph Neural Networks for Radio Resource Management

Figure 2 for Meta-Learning Empowered Graph Neural Networks for Radio Resource Management

Figure 3 for Meta-Learning Empowered Graph Neural Networks for Radio Resource Management

Figure 4 for Meta-Learning Empowered Graph Neural Networks for Radio Resource Management

Abstract:In this paper, we consider a radio resource management (RRM) problem in the dynamic wireless networks, comprising multiple communication links that share the same spectrum resource. To achieve high network throughput while ensuring fairness across all links, we formulate a resilient power optimization problem with per-user minimum-rate constraints. We obtain the corresponding Lagrangian dual problem and parameterize all variables with neural networks, which can be trained in an unsupervised manner due to the provably acceptable duality gap. We develop a meta-learning approach with graph neural networks (GNNs) as parameterization that exhibits fast adaptation and scalability to varying network configurations. We formulate the objective of meta-learning by amalgamating the Lagrangian functions of different network configurations and utilize a first-order meta-learning algorithm, called Reptile, to obtain the meta-parameters. Numerical results verify that our method can efficiently improve the overall throughput and ensure the minimum rate performance. We further demonstrate that using the meta-parameters as initialization, our method can achieve fast adaptation to new wireless network configurations and reduce the number of required training data samples.

Via

Access Paper or Ask Questions

Beam Prediction based on Large Language Models

Aug 16, 2024

Yucheng Sheng, Kai Huang, Le Liang, Peng Liu, Shi Jin, Geoffrey Ye Li

Figure 1 for Beam Prediction based on Large Language Models

Figure 2 for Beam Prediction based on Large Language Models

Figure 3 for Beam Prediction based on Large Language Models

Figure 4 for Beam Prediction based on Large Language Models

Abstract:Millimeter-wave (mmWave) communication is promising for next-generation wireless networks but suffers from significant path loss, requiring extensive antenna arrays and frequent beam training. Traditional deep learning models, such as long short-term memory (LSTM), enhance beam tracking accuracy however are limited by poor robustness and generalization. In this letter, we use large language models (LLMs) to improve the robustness of beam prediction. By converting time series data into text-based representations and employing the Prompt-as-Prefix (PaP) technique for contextual enrichment, our approach unleashes the strength of LLMs for time series forecasting. Simulation results demonstrate that our LLM-based method offers superior robustness and generalization compared to LSTM-based models, showcasing the potential of LLMs in wireless communications.

Via

Access Paper or Ask Questions

Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval

Jul 01, 2024

Hanwen Su, Ge Song, Kai Huang, Jiyan Wang, Ming Yang

Abstract:In this paper, we study the problem of zero-shot sketch-based image retrieval (ZS-SBIR). The prior methods tackle the problem in a two-modality setting with only category labels or even no textual information involved. However, the growing prevalence of Large-scale pre-trained Language Models (LLMs), which have demonstrated great knowledge learned from web-scale data, can provide us with an opportunity to conclude collective textual information. Our key innovation lies in the usage of text data as auxiliary information for images, thus leveraging the inherent zero-shot generalization ability that language offers. To this end, we propose an approach called Cross-Modal Attention Alignment Network with Auxiliary Text Description for zero-shot sketch-based image retrieval. The network consists of three components: (i) a Description Generation Module that generates textual descriptions for each training category by prompting an LLM with several interrogative sentences, (ii) a Feature Extraction Module that includes two ViTs for sketch and image data, a transformer for extracting tokens of sentences of each training category, finally (iii) a Cross-modal Alignment Module that exchanges the token features of both text-sketch and text-image using cross-attention mechanism, and align the tokens locally and globally. Extensive experiments on three benchmark datasets show our superior performances over the state-of-the-art ZS-SBIR methods.

Via

Access Paper or Ask Questions

Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Jun 28, 2024

Quanmin Liang, Zhilin Huang, Xiawu Zheng, Feidiao Yang, Jun Peng, Kai Huang, Yonghong Tian

Figure 1 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Figure 2 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Figure 3 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Figure 4 for Efficient Event Stream Super-Resolution with Recursive Multi-Branch Fusion

Abstract:Current Event Stream Super-Resolution (ESR) methods overlook the redundant and complementary information present in positive and negative events within the event stream, employing a direct mixing approach for super-resolution, which may lead to detail loss and inefficiency. To address these issues, we propose an efficient Recursive Multi-Branch Information Fusion Network (RMFNet) that separates positive and negative events for complementary information extraction, followed by mutual supplementation and refinement. Particularly, we introduce Feature Fusion Modules (FFM) and Feature Exchange Modules (FEM). FFM is designed for the fusion of contextual information within neighboring event streams, leveraging the coupling relationship between positive and negative events to alleviate the misleading of noises in the respective branches. FEM efficiently promotes the fusion and exchange of information between positive and negative branches, enabling superior local information enhancement and global information complementation. Experimental results demonstrate that our approach achieves over 17% and 31% improvement on synthetic and real datasets, accompanied by a 2.3X acceleration. Furthermore, we evaluate our method on two downstream event-driven applications, \emph{i.e.}, object recognition and video reconstruction, achieving remarkable results that outperform existing methods. Our code and Supplementary Material are available at https://github.com/Lqm26/RMFNet.

* International Joint Conference on Artificial Intelligence 2024

Via

Access Paper or Ask Questions

MammothModa: Multi-Modal Large Language Model

Jun 26, 2024

Qi She, Junwen Pan, Xin Wan, Rui Zhang, Dawei Lu, Kai Huang

Abstract:In this report, we introduce MammothModa, yet another multi-modal large language model (MLLM) designed to achieve state-of-the-art performance starting from an elementary baseline. We focus on three key design insights: (i) Integrating Visual Capabilities while Maintaining Complex Language Understanding: In addition to the vision encoder, we incorporated the Visual Attention Experts into the LLM to enhance its visual capabilities. (ii) Extending Context Window for High-Resolution and Long-Duration Visual Feature: We explore the Visual Merger Module to effectively reduce the token number of high-resolution images and incorporated frame position ids to avoid position interpolation. (iii) High-Quality Bilingual Datasets: We meticulously curated and filtered a high-quality bilingual multimodal dataset to reduce visual hallucinations. With above recipe we build MammothModa that consistently outperforms the state-of-the-art models, e.g., LLaVA-series, across main real-world visual language benchmarks without bells and whistles.

* Technical report

Via

Access Paper or Ask Questions