Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chuang Yang

Senior Member, IEEE

Road Similarity-Based BEV-Satellite Image Matching for UGV Localization

Apr 23, 2025

Zhenping Sun, Chuang Yang, Yafeng Bu, Bokai Liu, Jun Zeng, Xiaohui Li

Abstract:To address the challenge of autonomous UGV localization in GNSS-denied off-road environments,this study proposes a matching-based localization method that leverages BEV perception image and satellite map within a road similarity space to achieve high-precision positioning.We first implement a robust LiDAR-inertial odometry system, followed by the fusion of LiDAR and image data to generate a local BEV perception image of the UGV. This approach mitigates the significant viewpoint discrepancy between ground-view images and satellite map. The BEV image and satellite map are then projected into the road similarity space, where normalized cross correlation (NCC) is computed to assess the matching score.Finally, a particle filter is employed to estimate the probability distribution of the vehicle's pose.By comparing with GNSS ground truth, our localization system demonstrated stability without divergence over a long-distance test of 10 km, achieving an average lateral error of only 0.89 meters and an average planar Euclidean error of 3.41 meters. Furthermore, it maintained accurate and stable global localization even under nighttime conditions, further validating its robustness and adaptability.

* 7 pages,9 figures,published to IROS2025

Via

Access Paper or Ask Questions

Edge Approximation Text Detector

Apr 05, 2025

Chuang Yang, Xu Han, Tao Han, Han Han, Bingxuan Zhao, Qi Wang

Abstract:Pursuing efficient text shape representations helps scene text detection models focus on compact foreground regions and optimize the contour reconstruction steps to simplify the whole detection pipeline. Current approaches either represent irregular shapes via box-to-polygon strategy or decomposing a contour into pieces for fitting gradually, the deficiency of coarse contours or complex pipelines always exists in these models. Considering the above issues, we introduce EdgeText to fit text contours compactly while alleviating excessive contour rebuilding processes. Concretely, it is observed that the two long edges of texts can be regarded as smooth curves. It allows us to build contours via continuous and smooth edges that cover text regions tightly instead of fitting piecewise, which helps avoid the two limitations in current models. Inspired by this observation, EdgeText formulates the text representation as the edge approximation problem via parameterized curve fitting functions. In the inference stage, our model starts with locating text centers, and then creating curve functions for approximating text edges relying on the points. Meanwhile, truncation points are determined based on the location features. In the end, extracting curve segments from curve functions by using the pixel coordinate information brought by truncation points to reconstruct text contours. Furthermore, considering the deep dependency of EdgeText on text edges, a bilateral enhanced perception (BEP) module is designed. It encourages our model to pay attention to the recognition of edge features. Additionally, to accelerate the learning of the curve function parameters, we introduce a proportional integral loss (PI-loss) to force the proposed model to focus on the curve distribution and avoid being disturbed by text scales.

Via

Access Paper or Ask Questions

ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction

Apr 02, 2025

Yuejiao Su, Yi Wang, Qiongyang Hu, Chuang Yang, Lap-Pui Chau

Abstract:Egocentric interaction perception is one of the essential branches in investigating human-environment interaction, which lays the basis for developing next-generation intelligent systems. However, existing egocentric interaction understanding methods cannot yield coherent textual and pixel-level responses simultaneously according to user queries, which lacks flexibility for varying downstream application requirements. To comprehend egocentric interactions exhaustively, this paper presents a novel task named Egocentric Interaction Reasoning and pixel Grounding (Ego-IRG). Taking an egocentric image with the query as input, Ego-IRG is the first task that aims to resolve the interactions through three crucial steps: analyzing, answering, and pixel grounding, which results in fluent textual and fine-grained pixel-level responses. Another challenge is that existing datasets cannot meet the conditions for the Ego-IRG task. To address this limitation, this paper creates the Ego-IRGBench dataset based on extensive manual efforts, which includes over 20k egocentric images with 1.6 million queries and corresponding multimodal responses about interactions. Moreover, we design a unified ANNEXE model to generate text- and pixel-level outputs utilizing multimodal large language models, which enables a comprehensive interpretation of egocentric interactions. The experiments on the Ego-IRGBench exhibit the effectiveness of our ANNEXE model compared with other works.

* Computer Vision and Pattern Recognition

Via

Access Paper or Ask Questions

Joint Communication and Radar Sensing for Terahertz Space-Air-Ground Integrated Networks (SAGIN)

Feb 25, 2025

Chong Han, Weijun Gao, Zhepu Yin, Chuang Yang, Mugen Peng, Wenjun Zhang

Abstract:The transition from isolated systems to integrated solutions has driven the development of space-air-ground integrated networks (SAGIN) as well as the integration of communication and radar sensing functionalities. By leveraging the unique properties of the Terahertz (THz) band, THz joint communication and radar sensing (JCRS) supports high-speed communication and precise sensing, addressing the growing demands of SAGIN for connectivity and environmental awareness. However, most existing THz studies focus on terrestrial and static scenarios, with limited consideration for the dynamic and non-terrestrial environments of SAGIN. In this paper, the THz JCRS techniques for SAGIN are comprehensively investigated. Specifically, propagation characteristics and channel models of THz waves in non-terrestrial environments are analyzed. A link capacity comparison with millimeter-wave, THz, and free-space optical frequency bands is conducted to highlight the advantages of THz frequencies. Moreover, novel JCRS waveform design strategies are presented to achieve mutual merit of communication and radar sensing, while networking strategies are developed to overcome challenges in SAGIN such as high mobility. Furthermore, advancements in THz device technologies, including antennas and amplifiers, are reviewed to assess their roles in enabling practical JCRS implementations.

Via

Access Paper or Ask Questions

MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

Dec 18, 2024

Chuang Yang, Bingxuan Zhao, Qing Zhou, Qi Wang

Figure 1 for MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

Figure 2 for MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

Figure 3 for MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

Figure 4 for MMO-IG: Multi-Class and Multi-Scale Object Image Generation for Remote Sensing

Abstract:The rapid advancement of deep generative models (DGMs) has significantly advanced research in computer vision, providing a cost-effective alternative to acquiring vast quantities of expensive imagery. However, existing methods predominantly focus on synthesizing remote sensing (RS) images aligned with real images in a global layout view, which limits their applicability in RS image object detection (RSIOD) research. To address these challenges, we propose a multi-class and multi-scale object image generator based on DGMs, termed MMO-IG, designed to generate RS images with supervised object labels from global and local aspects simultaneously. Specifically, from the local view, MMO-IG encodes various RS instances using an iso-spacing instance map (ISIM). During the generation process, it decodes each instance region with iso-spacing value in ISIM-corresponding to both background and foreground instances-to produce RS images through the denoising process of diffusion models. Considering the complex interdependencies among MMOs, we construct a spatial-cross dependency knowledge graph (SCDKG). This ensures a realistic and reliable multidirectional distribution among MMOs for region embedding, thereby reducing the discrepancy between source and target domains. Besides, we propose a structured object distribution instruction (SODI) to guide the generation of synthesized RS image content from a global aspect with SCDKG-based ISIM together. Extensive experimental results demonstrate that our MMO-IG exhibits superior generation capabilities for RS images with dense MMO-supervised labels, and RS detectors pre-trained with MMO-IG show excellent performance on real-world datasets.

Via

Access Paper or Ask Questions

CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Dec 03, 2024

Hao Chen, Han Tao, Guo Song, Jie Zhang, Yunlong Yu, Yonghan Dong, Chuang Yang, Lei Bai

Figure 1 for CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Figure 2 for CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Figure 3 for CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Figure 4 for CA-MoE: Channel-Adapted MoE for Incremental Weather Forecasting

Abstract:Atmospheric science is intricately connected with other fields, e.g., geography and aerospace. Most existing approaches involve training a joint atmospheric and geographic model from scratch, which incurs significant computational costs and overlooks the potential for incremental learning of weather variables across different domains. In this paper, we introduce incremental learning to weather forecasting and propose a novel structure that allows for the flexible expansion of variables within the model. Specifically, our method presents a Channel-Adapted MoE (CA-MoE) that employs a divide-and-conquer strategy. This strategy assigns variable training tasks to different experts by index embedding and reduces computational complexity through a channel-wise Top-K strategy. Experiments conducted on the widely utilized ERA5 dataset reveal that our method, utilizing only approximately 15\% of trainable parameters during the incremental stage, attains performance that is on par with state-of-the-art competitors. Notably, in the context of variable incremental experiments, our method demonstrates negligible issues with catastrophic forgetting.

Via

Access Paper or Ask Questions

SignEye: Traffic Sign Interpretation from Vehicle First-Person View

Nov 18, 2024

Chuang Yang, Xu Han, Tao Han, Yuejiao SU, Junyu Gao, Hongyuan Zhang, Yi Wang, Lap-Pui Chau

Abstract:Traffic signs play a key role in assisting autonomous driving systems (ADS) by enabling the assessment of vehicle behavior in compliance with traffic regulations and providing navigation instructions. However, current works are limited to basic sign understanding without considering the egocentric vehicle's spatial position, which fails to support further regulation assessment and direction navigation. Following the above issues, we introduce a new task: traffic sign interpretation from the vehicle's first-person view, referred to as TSI-FPV. Meanwhile, we develop a traffic guidance assistant (TGA) scenario application to re-explore the role of traffic signs in ADS as a complement to popular autonomous technologies (such as obstacle perception). Notably, TGA is not a replacement for electronic map navigation; rather, TGA can be an automatic tool for updating it and complementing it in situations such as offline conditions or temporary sign adjustments. Lastly, a spatial and semantic logic-aware stepwise reasoning pipeline (SignEye) is constructed to achieve the TSI-FPV and TGA, and an application-specific dataset (Traffic-CN) is built. Experiments show that TSI-FPV and TGA are achievable via our SignEye trained on Traffic-CN. The results also demonstrate that the TGA can provide complementary information to ADS beyond existing popular autonomous technologies.

Via

Access Paper or Ask Questions

Sensing-Assisted Beam Tracking with Real-Time Beamwidth Adaptation for THz Communications

Nov 13, 2024

Wuhan Chen, Yuheng Fan, Chuang Yang, Mugen Peng

Abstract:Terahertz (THz) communications, with their substantial bandwidth, are essential for meeting the ultra-high data rate demands of emerging high-mobility scenarios such as vehicular-to-everything (V2X) networks. In these contexts, beamwidth adaptation has been explored to address the problem that high-mobility targets frequently move out of the narrow THz beam range. However, existing approaches cannot effectively track targets due to a lack of real-time motion awareness. Consequently, we propose a sensing-assisted beam tracking scheme with real-time beamwidth adaptation. Specifically, the base station (BS) periodically collects prior sensing information to predict the target's motion path by applying a particular motion model. Then, we build a pre-calculated codebook by optimising precoders to align the beamwidth with various predicted target paths, thereby maximising the average achievable data rates within each sensing period. Finally, the BS selects the optimal precoder from the codebook to maintain stable and continuous connectivity. Simulation results show that the proposed scheme significantly improves the rate performance and reduces outage probability compared to existing approaches under various target mobility.

Via

Access Paper or Ask Questions

Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

Nov 05, 2024

Xu Han, Junyu Gao, Chuang Yang, Yuan Yuan, Qi Wang

Figure 1 for Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

Figure 2 for Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

Figure 3 for Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

Figure 4 for Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes

Abstract:Texts on the intelligent transportation scene include mass information. Fully harnessing this information is one of the critical drivers for advancing intelligent transportation. Unlike the general scene, detecting text in transportation has extra demand, such as a fast inference speed, except for high accuracy. Most existing real-time text detection methods are based on the shrink mask, which loses some geometry semantic information and needs complex post-processing. In addition, the previous method usually focuses on correct output, which ignores feature correction and lacks guidance during the intermediate process. To this end, we propose an efficient multi-scene text detector that contains an effective text representation similar mask (SM) and a feature correction module (FCM). Unlike previous methods, the former aims to preserve the geometric information of the instances as much as possible. Its post-progressing saves 50$\%$ of the time, accurately and efficiently reconstructing text contours. The latter encourages false positive features to move away from the positive feature center, optimizing the predictions from the feature level. Some ablation studies demonstrate the efficiency of the SM and the effectiveness of the FCM. Moreover, the deficiency of existing traffic datasets (such as the low-quality annotation or closed source data unavailability) motivated us to collect and annotate a traffic text dataset, which introduces motion blur. In addition, to validate the scene robustness of the SM-Net, we conduct experiments on traffic, industrial, and natural scene datasets. Extensive experiments verify it achieves (SOTA) performance on several benchmarks. The code and dataset are available at: \url{https://github.com/fengmulin/SMNet}.

Via

Access Paper or Ask Questions

Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Oct 31, 2024

Peizhi Tang, Chuang Yang, Tong Xing, Xiaohang Xu, Renhe Jiang, Kaoru Sezaki

Figure 1 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Figure 2 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Figure 3 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Figure 4 for Instruction-Tuning Llama-3-8B Excels in City-Scale Mobility Prediction

Abstract:Human mobility prediction plays a critical role in applications such as disaster response, urban planning, and epidemic forecasting. Traditional methods often rely on designing crafted, domain-specific models, and typically focus on short-term predictions, which struggle to generalize across diverse urban environments. In this study, we introduce Llama-3-8B-Mob, a large language model fine-tuned with instruction tuning, for long-term citywide mobility prediction -- in a Q&A manner. We validate our approach using large-scale human mobility data from four metropolitan areas in Japan, focusing on predicting individual trajectories over the next 15 days. The results demonstrate that Llama-3-8B-Mob excels in modeling long-term human mobility -- surpassing the state-of-the-art on multiple prediction metrics. It also displays strong zero-shot generalization capabilities -- effectively generalizing to other cities even when fine-tuned only on limited samples from a single city. Source codes are available at https://github.com/TANGHULU6/Llama3-8B-Mob.

Via

Access Paper or Ask Questions