Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Li

Michael

Movable Antenna Enhanced Networked Full-Duplex Integrated Sensing and Communication System

Nov 14, 2024

Yuan Guo, Wen Chen, Qingqing Wu, Yang Liu, Qiong Wu, Kunlun Wang, Jun Li, Lexi Xu

Figure 1 for Movable Antenna Enhanced Networked Full-Duplex Integrated Sensing and Communication System

Figure 2 for Movable Antenna Enhanced Networked Full-Duplex Integrated Sensing and Communication System

Figure 3 for Movable Antenna Enhanced Networked Full-Duplex Integrated Sensing and Communication System

Figure 4 for Movable Antenna Enhanced Networked Full-Duplex Integrated Sensing and Communication System

Abstract:Integrated sensing and communication (ISAC) is envisioned as a key technology for future sixth-generation (6G) networks. Classical ISAC system considering monostatic and/or bistatic settings will inevitably degrade both communication and sensing performance due to the limited service coverage and easily blocked transmission paths. Besides, existing ISAC studies usually focus on downlink (DL) or uplink (UL) communication demands and unable to achieve the systematic DL and UL communication tasks. These challenges can be overcome by networked FD ISAC framework. Moreover, ISAC generally considers the trade-off between communication and sensing, unavoidably leading to a loss in communication performance. This shortcoming can be solved by the emerging movable antenna (MA) technology. In this paper, we utilize the MA to promote communication capability with guaranteed sensing performance via jointly designing beamforming, power allocation, receiving filters and MA configuration towards maximizing sum rate. The optimization problem is highly difficult due to the unique channel model deriving from the MA. To resolve this challenge, via leveraging the cutting-the-edge majorization-minimization (MM) method, we develop an efficient solution that optimizes all variables via convex optimization techniques. Extensive simulation results verify the effectiveness of our proposed algorithms and demonstrate the substantial performance promotion by deploying MA in the networked FD ISAC system.

Via

Access Paper or Ask Questions

Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization

Nov 05, 2024

Zakariae Belmekki, Jun Li, Patrick Reuter, David Antonio Gómez Jáuregui, Karl Jenkins

Figure 1 for Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization

Figure 2 for Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization

Figure 3 for Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization

Figure 4 for Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization

Abstract:Convolutional Neural Networks (CNNs) have been heavily used in Deep Learning due to their success in various tasks. Nonetheless, it has been observed that CNNs suffer from redundancy in feature maps, leading to inefficient capacity utilization. Efforts to mitigate and solve this problem led to the emergence of multiple methods, amongst which is kernel orthogonality through variant means. In this work, we challenge the common belief that kernel orthogonality leads to a decrease in feature map redundancy, which is, supposedly, the ultimate objective behind kernel orthogonality. We prove, theoretically and empirically, that kernel orthogonality has an unpredictable effect on feature map similarity and does not necessarily decrease it. Based on our theoretical result, we propose an effective method to reduce feature map similarity independently of the input of the CNN. This is done by minimizing a novel loss function we call Convolutional Similarity. Empirical results show that minimizing the Convolutional Similarity increases the performance of classification models and can accelerate their convergence. Furthermore, using our proposed method pushes towards a more efficient use of the capacity of models, allowing the use of significantly smaller models to achieve the same levels of performance.

Via

Access Paper or Ask Questions

Novel Object Synthesis via Adaptive Text-Image Harmony

Oct 28, 2024

Zeren Xiong, Zedong Zhang, Zikun Chen, Shuo Chen, Xiang Li, Gan Sun, Jian Yang, Jun Li

Figure 1 for Novel Object Synthesis via Adaptive Text-Image Harmony

Figure 2 for Novel Object Synthesis via Adaptive Text-Image Harmony

Figure 3 for Novel Object Synthesis via Adaptive Text-Image Harmony

Figure 4 for Novel Object Synthesis via Adaptive Text-Image Harmony

Abstract:In this paper, we study an object synthesis task that combines an object text with an object image to create a new object image. However, most diffusion models struggle with this task, \textit{i.e.}, often generating an object that predominantly reflects either the text or the image due to an imbalance between their inputs. To address this issue, we propose a simple yet effective method called Adaptive Text-Image Harmony (ATIH) to generate novel and surprising objects. First, we introduce a scale factor and an injection step to balance text and image features in cross-attention and to preserve image information in self-attention during the text-image inversion diffusion process, respectively. Second, to better integrate object text and image, we design a balanced loss function with a noise parameter, ensuring both optimal editability and fidelity of the object image. Third, to adaptively adjust these parameters, we present a novel similarity score function that not only maximizes the similarities between the generated object image and the input text/image but also balances these similarities to harmonize text and image integration. Extensive experiments demonstrate the effectiveness of our approach, showcasing remarkable object creations such as colobus-glass jar. Project page: https://xzr52.github.io/ATIH/.

* NeurIPS2024

Via

Access Paper or Ask Questions

Deep Learning-Based Fatigue Cracks Detection in Bridge Girders using Feature Pyramid Networks

Oct 28, 2024

Jiawei Zhang, Jun Li, Reachsak Ly, Yunyi Liu, Jiangpeng Shu

Figure 1 for Deep Learning-Based Fatigue Cracks Detection in Bridge Girders using Feature Pyramid Networks

Figure 2 for Deep Learning-Based Fatigue Cracks Detection in Bridge Girders using Feature Pyramid Networks

Figure 3 for Deep Learning-Based Fatigue Cracks Detection in Bridge Girders using Feature Pyramid Networks

Figure 4 for Deep Learning-Based Fatigue Cracks Detection in Bridge Girders using Feature Pyramid Networks

Abstract:For structural health monitoring, continuous and automatic crack detection has been a challenging problem. This study is conducted to propose a framework of automatic crack segmentation from high-resolution images containing crack information about steel box girders of bridges. Considering the multi-scale feature of cracks, convolutional neural network architecture of Feature Pyramid Networks (FPN) for crack detection is proposed. As for input, 120 raw images are processed via two approaches (shrinking the size of images and splitting images into sub-images). Then, models with the proposed structure of FPN for crack detection are developed. The result shows all developed models can automatically detect the cracks at the raw images. By shrinking the images, the computation efficiency is improved without decreasing accuracy. Because of the separable characteristic of crack, models using the splitting method provide more accurate crack segmentations than models using the resizing method. Therefore, for high-resolution images, the FPN structure coupled with the splitting method is an promising solution for the crack segmentation and detection.

* 15 pages, 11 figures

Via

Access Paper or Ask Questions

Attention-based Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Oct 24, 2024

Haoxuan Kuang, Kunxiang Deng, Linlin You, Jun Li

Figure 1 for Attention-based Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Figure 2 for Attention-based Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Figure 3 for Attention-based Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Figure 4 for Attention-based Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Abstract:Electric vehicle charging demand prediction is important for vacant charging pile recommendation and charging infrastructure planning, thus facilitating vehicle electrification and green energy development. The performance of previous spatio-temporal studies is still far from satisfactory because the traditional graphs are difficult to model non-pairwise spatial relationships and multivariate temporal features are not adequately taken into account. To tackle these issues, we propose an attention-based heterogeneous multivariate data fusion approach (AHMDF) for citywide electric vehicle charging demand prediction, which incorporates geo-based clustered hypergraph and multivariate gated Transformer to considers both static and dynamic influences. To learn non-pairwise relationships, we cluster service areas by the types and numbers of points of interest in the areas and develop attentive hypergraph networks accordingly. Graph attention mechanisms are used for information propagation between neighboring areas. Additionally, we improve the Transformer encoder utilizing gated mechanisms so that it can selectively learn dynamic auxiliary information and temporal features. Experiments on an electric vehicle charging benchmark dataset demonstrate the effectiveness of our proposed approach compared with a broad range of competing baselines. Furthermore, we demonstrate the impact of dynamic influences on prediction results in different areas of the city and the effectiveness of our clustering method.

Via

Access Paper or Ask Questions

DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Oct 19, 2024

Kun Wang, Zhiqiang Yan, Junkai Fan, Wanlu Zhu, Xiang Li, Jun Li, Jian Yang

Figure 1 for DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Figure 2 for DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Figure 3 for DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Figure 4 for DCDepth: Progressive Monocular Depth Estimation in Discrete Cosine Domain

Abstract:In this paper, we introduce DCDepth, a novel framework for the long-standing monocular depth estimation task. Moving beyond conventional pixel-wise depth estimation in the spatial domain, our approach estimates the frequency coefficients of depth patches after transforming them into the discrete cosine domain. This unique formulation allows for the modeling of local depth correlations within each patch. Crucially, the frequency transformation segregates the depth information into various frequency components, with low-frequency components encapsulating the core scene structure and high-frequency components detailing the finer aspects. This decomposition forms the basis of our progressive strategy, which begins with the prediction of low-frequency components to establish a global scene context, followed by successive refinement of local details through the prediction of higher-frequency components. We conduct comprehensive experiments on NYU-Depth-V2, TOFDC, and KITTI datasets, and demonstrate the state-of-the-art performance of DCDepth. Code is available at https://github.com/w2kun/DCDepth.

* Accepted by NeurIPS-2024

Via

Access Paper or Ask Questions

V2I-Calib++: A Multi-terminal Spatial Calibration Approach in Urban Intersections for Collaborative Perception

Oct 14, 2024

Qianxin Qu, Xinyu Zhang, Yijin Xiong, Shichun Guo, Ziqiang Song, Jun Li

Figure 1 for V2I-Calib++: A Multi-terminal Spatial Calibration Approach in Urban Intersections for Collaborative Perception

Figure 2 for V2I-Calib++: A Multi-terminal Spatial Calibration Approach in Urban Intersections for Collaborative Perception

Figure 3 for V2I-Calib++: A Multi-terminal Spatial Calibration Approach in Urban Intersections for Collaborative Perception

Figure 4 for V2I-Calib++: A Multi-terminal Spatial Calibration Approach in Urban Intersections for Collaborative Perception

Abstract:Urban intersections, dense with pedestrian and vehicular traffic and compounded by GPS signal obstructions from high-rise buildings, are among the most challenging areas in urban traffic systems. Traditional single-vehicle intelligence systems often perform poorly in such environments due to a lack of global traffic flow information and the ability to respond to unexpected events. Vehicle-to-Everything (V2X) technology, through real-time communication between vehicles (V2V) and vehicles to infrastructure (V2I), offers a robust solution. However, practical applications still face numerous challenges. Calibration among heterogeneous vehicle and infrastructure endpoints in multi-end LiDAR systems is crucial for ensuring the accuracy and consistency of perception system data. Most existing multi-end calibration methods rely on initial calibration values provided by positioning systems, but the instability of GPS signals due to high buildings in urban canyons poses severe challenges to these methods. To address this issue, this paper proposes a novel multi-end LiDAR system calibration method that does not require positioning priors to determine initial external parameters and meets real-time requirements. Our method introduces an innovative multi-end perception object association technique, utilizing a new Overall Distance metric (oDist) to measure the spatial association between perception objects, and effectively combines global consistency search algorithms with optimal transport theory. By this means, we can extract co-observed targets from object association results for further external parameter computation and optimization. Extensive comparative and ablation experiments conducted on the simulated dataset V2X-Sim and the real dataset DAIR-V2X confirm the effectiveness and efficiency of our method. The code for this method can be accessed at: \url{https://github.com/MassimoQu/v2i-calib}.

Via

Access Paper or Ask Questions

Deep Transfer Learning-based Detection for Flash Memory Channels

Oct 08, 2024

Zhen Mei, Kui Cai, Long Shi, Jun Li, Li Chen, Kees A. Schouhamer Immink

Figure 1 for Deep Transfer Learning-based Detection for Flash Memory Channels

Figure 2 for Deep Transfer Learning-based Detection for Flash Memory Channels

Figure 3 for Deep Transfer Learning-based Detection for Flash Memory Channels

Figure 4 for Deep Transfer Learning-based Detection for Flash Memory Channels

Abstract:The NAND flash memory channel is corrupted by different types of noises, such as the data retention noise and the wear-out noise, which lead to unknown channel offset and make the flash memory channel non-stationary. In the literature, machine learning-based methods have been proposed for data detection for flash memory channels. However, these methods require a large number of training samples and labels to achieve a satisfactory performance, which is costly. Furthermore, with a large unknown channel offset, it may be impossible to obtain enough correct labels. In this paper, we reformulate the data detection for the flash memory channel as a transfer learning (TL) problem. We then propose a model-based deep TL (DTL) algorithm for flash memory channel detection. It can effectively reduce the training data size from $10^6$ samples to less than 104 samples. Moreover, we propose an unsupervised domain adaptation (UDA)-based DTL algorithm using moment alignment, which can detect data without any labels. Hence, it is suitable for scenarios where the decoding of error-correcting code fails and no labels can be obtained. Finally, a UDA-based threshold detector is proposed to eliminate the need for a neural network. Both the channel raw error rate analysis and simulation results demonstrate that the proposed DTL-based detection schemes can achieve near-optimal bit error rate (BER) performance with much less training data and/or without using any labels.

* This paper has been accepted for publication in IEEE Transactions on Communications

Via

Access Paper or Ask Questions

IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification

Oct 07, 2024

Yan He, Bing Tu, Puzhao Jiang, Bo Liu, Jun Li, Antonio Plaza

Figure 1 for IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification

Figure 2 for IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification

Figure 3 for IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification

Figure 4 for IGroupSS-Mamba: Interval Group Spatial-Spectral Mamba for Hyperspectral Image Classification

Abstract:Hyperspectral image (HSI) classification has garnered substantial attention in remote sensing fields. Recent Mamba architectures built upon the Selective State Space Models (S6) have demonstrated enormous potential in long-range sequence modeling. However, the high dimensionality of hyperspectral data and information redundancy pose challenges to the application of Mamba in HSI classification, suffering from suboptimal performance and computational efficiency. In light of this, this paper investigates a lightweight Interval Group Spatial-Spectral Mamba framework (IGroupSS-Mamba) for HSI classification, which allows for multi-directional and multi-scale global spatial-spectral information extraction in a grouping and hierarchical manner. Technically, an Interval Group S6 Mechanism (IGSM) is developed as the core component, which partitions high-dimensional features into multiple non-overlapping groups at intervals, and then integrates a unidirectional S6 for each group with a specific scanning direction to achieve non-redundant sequence modeling. Compared to conventional applying multi-directional scanning to all bands, this grouping strategy leverages the complementary strengths of different scanning directions while decreasing computational costs. To adequately capture the spatial-spectral contextual information, an Interval Group Spatial-Spectral Block (IGSSB) is introduced, in which two IGSM-based spatial and spectral operators are cascaded to characterize the global spatial-spectral relationship along the spatial and spectral dimensions, respectively. IGroupSS-Mamba is constructed as a hierarchical structure stacked by multiple IGSSB blocks, integrating a pixel aggregation-based downsampling strategy for multiscale spatial-spectral semantic learning from shallow to deep stages. Extensive experiments demonstrate that IGroupSS-Mamba outperforms the state-of-the-art methods.

Via

Access Paper or Ask Questions

Organizing Unstructured Image Collections using Natural Language

Oct 07, 2024

Mingxuan Liu, Zhun Zhong, Jun Li, Gianni Franchi, Subhankar Roy, Elisa Ricci

Figure 1 for Organizing Unstructured Image Collections using Natural Language

Figure 2 for Organizing Unstructured Image Collections using Natural Language

Figure 3 for Organizing Unstructured Image Collections using Natural Language

Figure 4 for Organizing Unstructured Image Collections using Natural Language

Abstract:Organizing unstructured visual data into semantic clusters is a key challenge in computer vision. Traditional deep clustering (DC) approaches focus on a single partition of data, while multiple clustering (MC) methods address this limitation by uncovering distinct clustering solutions. The rise of large language models (LLMs) and multimodal LLMs (MLLMs) has enhanced MC by allowing users to define clustering criteria in natural language. However, manually specifying criteria for large datasets is impractical. In this work, we introduce the task Semantic Multiple Clustering (SMC) that aims to automatically discover clustering criteria from large image collections, uncovering interpretable substructures without requiring human input. Our framework, Text Driven Semantic Multiple Clustering (TeDeSC), uses text as a proxy to concurrently reason over large image collections, discover partitioning criteria, expressed in natural language, and reveal semantic substructures. To evaluate TeDeSC, we introduce the COCO-4c and Food-4c benchmarks, each containing four grouping criteria and ground-truth annotations. We apply TeDeSC to various applications, such as discovering biases and analyzing social media image popularity, demonstrating its utility as a tool for automatically organizing image collections and revealing novel insights.

* Preprint. Project webpage: https://oatmealliu.github.io/smc.html

Via

Access Paper or Ask Questions