Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ferdous Sohel

FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

Jan 31, 2025

Xinlong Wan, Xiaoyan Jiang, Guangsheng Luo, Ferdous Sohel, Jenqneng Hwang

Figure 1 for FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

Figure 2 for FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

Figure 3 for FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

Figure 4 for FlexiCrackNet: A Flexible Pipeline for Enhanced Crack Segmentation with General Features Transfered from SAM

Abstract:Automatic crack segmentation is a cornerstone technology for intelligent visual perception modules in road safety maintenance and structural integrity systems. Existing deep learning models and ``pre-training + fine-tuning'' paradigms often face challenges of limited adaptability in resource-constrained environments and inadequate scalability across diverse data domains. To overcome these limitations, we propose FlexiCrackNet, a novel pipeline that seamlessly integrates traditional deep learning paradigms with the strengths of large-scale pre-trained models. At its core, FlexiCrackNet employs an encoder-decoder architecture to extract task-specific features. The lightweight EdgeSAM's CNN-based encoder is exclusively used as a generic feature extractor, decoupled from the fixed input size requirements of EdgeSAM. To harmonize general and domain-specific features, we introduce the information-Interaction gated attention mechanism (IGAM), which adaptively fuses multi-level features to enhance segmentation performance while mitigating irrelevant noise. This design enables the efficient transfer of general knowledge to crack segmentation tasks while ensuring adaptability to diverse input resolutions and resource-constrained environments. Experiments show that FlexiCrackNet outperforms state-of-the-art methods, excels in zero-shot generalization, computational efficiency, and segmentation robustness under challenging scenarios such as blurry inputs, complex backgrounds, and visually ambiguous artifacts. These advancements underscore the potential of FlexiCrackNet for real-world applications in automated crack detection and comprehensive structural health monitoring systems.

Via

Access Paper or Ask Questions

A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Aug 22, 2024

Tahmina Khanam, Hamid Laga, Mohammed Bennamoun, Guanjin Wang, Ferdous Sohel, Farid Boussaid, Guan Wang, Anuj Srivastava

Figure 1 for A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Figure 2 for A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Figure 3 for A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Figure 4 for A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures

Abstract:We propose the first comprehensive approach for modeling and analyzing the spatiotemporal shape variability in tree-like 4D objects, i.e., 3D objects whose shapes bend, stretch, and change in their branching structure over time as they deform, grow, and interact with their environment. Our key contribution is the representation of tree-like 3D shapes using Square Root Velocity Function Trees (SRVFT). By solving the spatial registration in the SRVFT space, which is equipped with an L2 metric, 4D tree-shaped structures become time-parameterized trajectories in this space. This reduces the problem of modeling and analyzing 4D tree-like shapes to that of modeling and analyzing elastic trajectories in the SRVFT space, where elasticity refers to time warping. In this paper, we propose a novel mathematical representation of the shape space of such trajectories, a Riemannian metric on that space, and computational tools for fast and accurate spatiotemporal registration and geodesics computation between 4D tree-shaped structures. Leveraging these building blocks, we develop a full framework for modelling the spatiotemporal variability using statistical models and generating novel 4D tree-like structures from a set of exemplars. We demonstrate and validate the proposed framework using real 4D plant data.

Via

Access Paper or Ask Questions

Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Jun 28, 2024

Uchitha Rajapaksha, Ferdous Sohel, Hamid Laga, Dean Diepeveen, Mohammed Bennamoun

Figure 1 for Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Figure 2 for Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Figure 3 for Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Figure 4 for Deep Learning-based Depth Estimation Methods from Monocular Image and Videos: A Comprehensive Survey

Abstract:Estimating depth from single RGB images and videos is of widespread interest due to its applications in many areas, including autonomous driving, 3D reconstruction, digital entertainment, and robotics. More than 500 deep learning-based papers have been published in the past 10 years, which indicates the growing interest in the task. This paper presents a comprehensive survey of the existing deep learning-based methods, the challenges they address, and how they have evolved in their architecture and supervision methods. It provides a taxonomy for classifying the current work based on their input and output modalities, network architectures, and learning methods. It also discusses the major milestones in the history of monocular depth estimation, and different pipelines, datasets, and evaluation metrics used in existing methods.

* 46 pages, 10 figures, The paper has been accepted for publication in ACM Computing Surveys 2024

Via

Access Paper or Ask Questions

Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

Mar 02, 2024

Lian Xu, Mohammed Bennamoun, Farid Boussaid, Wanli Ouyang, Ferdous Sohel, Dan Xu

Figure 1 for Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

Figure 2 for Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

Figure 3 for Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

Figure 4 for Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation

Abstract:Most existing weakly supervised semantic segmentation (WSSS) methods rely on Class Activation Mapping (CAM) to extract coarse class-specific localization maps using image-level labels. Prior works have commonly used an off-line heuristic thresholding process that combines the CAM maps with off-the-shelf saliency maps produced by a general pre-trained saliency model to produce more accurate pseudo-segmentation labels. We propose AuxSegNet+, a weakly supervised auxiliary learning framework to explore the rich information from these saliency maps and the significant inter-task correlation between saliency detection and semantic segmentation. In the proposed AuxSegNet+, saliency detection and multi-label image classification are used as auxiliary tasks to improve the primary task of semantic segmentation with only image-level ground-truth labels. We also propose a cross-task affinity learning mechanism to learn pixel-level affinities from the saliency and segmentation feature maps. In particular, we propose a cross-task dual-affinity learning module to learn both pairwise and unary affinities, which are used to enhance the task-specific features and predictions by aggregating both query-dependent and query-independent global context for both saliency detection and semantic segmentation. The learned cross-task pairwise affinity can also be used to refine and propagate CAM maps to provide better pseudo labels for both tasks. Iterative improvement of segmentation performance is enabled by cross-task affinity learning and pseudo-label updating. Extensive experiments demonstrate the effectiveness of the proposed approach with new state-of-the-art WSSS results on the challenging PASCAL VOC and MS COCO benchmarks.

* Accepted at IEEE Transactions on Neural Networks and Learning Systems. arXiv admin note: substantial text overlap with arXiv:2107.11787

Via

Access Paper or Ask Questions

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Feb 02, 2024

Linping Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

Figure 1 for An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Figure 2 for An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Figure 3 for An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Figure 4 for An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Abstract:Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps.

* INTERSPEECH 2023

Via

Access Paper or Ask Questions

Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation

Feb 02, 2022

Juan Lu, Rebecca Hutchens, Joseph Hung, Mohammed Bennamoun, Brendan McQuillan, Tom Briffa, Ferdous Sohel, Kevin Murray, Jonathon Stewart, Benjamin Chow(+2 more)

Figure 1 for Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation

Figure 2 for Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation

Figure 3 for Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation

Abstract:Appropriate antithrombotic therapy for patients with atrial fibrillation (AF) requires assessment of ischemic stroke and bleeding risks. However, risk stratification schemas such as CHA2DS2-VASc and HAS-BLED have modest predictive capacity for patients with AF. Machine learning (ML) techniques may improve predictive performance and support decision-making for appropriate antithrombotic therapy. We compared the performance of multilabel ML models with the currently used risk scores for predicting outcomes in AF patients. Materials and Methods This was a retrospective cohort study of 9670 patients, mean age 76.9 years, 46% women, who were hospitalized with non-valvular AF, and had 1-year follow-up. The primary outcome was ischemic stroke and major bleeding admission. The secondary outcomes were all-cause death and event-free survival. The discriminant power of ML models was compared with clinical risk scores by the area under the curve (AUC). Risk stratification was assessed using the net reclassification index. Results Multilabel gradient boosting machine provided the best discriminant power for stroke, major bleeding, and death (AUC = 0.685, 0.709, and 0.765 respectively) compared to other ML models. It provided modest performance improvement for stroke compared to CHA2DS2-VASc (AUC = 0.652), but significantly improved major bleeding prediction compared to HAS-BLED (AUC = 0.522). It also had a much greater discriminant power for death compared with CHA2DS2-VASc (AUC = 0.606). Also, models identified additional risk features (such as hemoglobin level, renal function, etc.) for each outcome. Conclusions Multilabel ML models can outperform clinical risk stratification scores for predicting the risk of major bleeding and death in non-valvular AF patients.

Via

Access Paper or Ask Questions

Weed Recognition using Deep Learning Techniques on Class-imbalanced Imagery

Dec 15, 2021

A S M Mahmudul Hasan, Ferdous Sohel, Dean Diepeveen, Hamid Laga, Michael G. K. Jones

Figure 1 for Weed Recognition using Deep Learning Techniques on Class-imbalanced Imagery

Figure 2 for Weed Recognition using Deep Learning Techniques on Class-imbalanced Imagery

Figure 3 for Weed Recognition using Deep Learning Techniques on Class-imbalanced Imagery

Figure 4 for Weed Recognition using Deep Learning Techniques on Class-imbalanced Imagery

Abstract:Most weed species can adversely impact agricultural productivity by competing for nutrients required by high-value crops. Manual weeding is not practical for large cropping areas. Many studies have been undertaken to develop automatic weed management systems for agricultural crops. In this process, one of the major tasks is to recognise the weeds from images. However, weed recognition is a challenging task. It is because weed and crop plants can be similar in colour, texture and shape which can be exacerbated further by the imaging conditions, geographic or weather conditions when the images are recorded. Advanced machine learning techniques can be used to recognise weeds from imagery. In this paper, we have investigated five state-of-the-art deep neural networks, namely VGG16, ResNet-50, Inception-V3, Inception-ResNet-v2 and MobileNetV2, and evaluated their performance for weed recognition. We have used several experimental settings and multiple dataset combinations. In particular, we constructed a large weed-crop dataset by combining several smaller datasets, mitigating class imbalance by data augmentation, and using this dataset in benchmarking the deep neural networks. We investigated the use of transfer learning techniques by preserving the pre-trained weights for extracting the features and fine-tuning them using the images of crop and weed datasets. We found that VGG16 performed better than others on small-scale datasets, while ResNet-50 performed better than other deep networks on the large combined dataset.

* The paper is accepted by Crop and Pasture Science journal (https://www.publish.csiro.au/CP/justaccepted/CP21626)

Via

Access Paper or Ask Questions

Energy-cost aware off-grid base stations with IoT devices for developing a green heterogeneous network

Oct 12, 2021

Khondoker Ziaul Islam, MD. Sanwar Hossain, B. M. Ruhul Amin, Ferdous Sohel

Figure 1 for Energy-cost aware off-grid base stations with IoT devices for developing a green heterogeneous network

Figure 2 for Energy-cost aware off-grid base stations with IoT devices for developing a green heterogeneous network

Figure 3 for Energy-cost aware off-grid base stations with IoT devices for developing a green heterogeneous network

Figure 4 for Energy-cost aware off-grid base stations with IoT devices for developing a green heterogeneous network

Abstract:Heterogeneous network (HetNet) is a specified cellular platform to tackle the rapidly growing anticipated data traffic. From communications perspective, data loads can be mapped to energy loads that are generally placed on the operator networks. Meanwhile, renewable energy aided networks offer to curtail fossil fuel consumption, so to reduce environmental pollution. This paper proposes a renewable energy based power supply architecture for off-grid HetNet using a novel energy sharing model. Solar photovoltaic (PV) along with sufficient energy storage devices are used for each macro, micro, pico, or femto base station (BS). Additionally, biomass generator (BG) is used for macro and micro BSs. The collocated macro and micro BSs are connected through end-to-end resistive lines. A novel weighted proportional-fair resource-scheduling algorithm with sleep mechanisms is proposed for non-real time (NRT) applications by trading-off the power consumption and communication delays. Furthermore, the proposed algorithm with extended discontinuous reception (eDRX) and power saving mode (PSM) for narrowband internet of things (IoT) applications extends battery lifetime for IoT devices. HOMER optimization software is used to perform optimal system architecture, economic, and carbon footprint analyses while Monte-Carlo simulation tool is used for evaluating the throughput and energy efficiency performances. The proposed algorithms are valid for the practical data of the rural areas. We demonstrate the proposed power supply architecture is energy-efficient, cost-effective, reliable, and eco-friendly.

Via

Access Paper or Ask Questions

Anti-aliasing Deep Image Classifiers using Novel Depth Adaptive Blurring and Activation Function

Oct 03, 2021

Md Tahmid Hossain, Shyh Wei Teng, Ferdous Sohel, Guojun Lu

Figure 1 for Anti-aliasing Deep Image Classifiers using Novel Depth Adaptive Blurring and Activation Function

Figure 2 for Anti-aliasing Deep Image Classifiers using Novel Depth Adaptive Blurring and Activation Function

Figure 3 for Anti-aliasing Deep Image Classifiers using Novel Depth Adaptive Blurring and Activation Function

Figure 4 for Anti-aliasing Deep Image Classifiers using Novel Depth Adaptive Blurring and Activation Function

Abstract:Deep convolutional networks are vulnerable to image translation or shift, partly due to common down-sampling layers, e.g., max-pooling and strided convolution. These operations violate the Nyquist sampling rate and cause aliasing. The textbook solution is low-pass filtering (blurring) before down-sampling, which can benefit deep networks as well. Even so, non-linearity units, such as ReLU, often re-introduce the problem, suggesting that blurring alone may not suffice. In this work, first, we analyse deep features with Fourier transform and show that Depth Adaptive Blurring is more effective, as opposed to monotonic blurring. To this end, we outline how this can replace existing down-sampling methods. Second, we introduce a novel activation function -- with a built-in low pass filter, to keep the problem from reappearing. From experiments, we observe generalisation on other forms of transformations and corruptions as well, e.g., rotation, scale, and noise. We evaluate our method under three challenging settings: (1) a variety of image translations; (2) adversarial attacks -- both $\ell_{p}$ bounded and unbounded; and (3) data corruptions and perturbations. In each setting, our method achieves state-of-the-art results and improves clean accuracy on various benchmark datasets.

Via

Access Paper or Ask Questions

A novel network training approach for open set image recognition

Sep 27, 2021

Md Tahmid Hossain, Shyh Wei Teng, Guojun Lu, Ferdous Sohel

Figure 1 for A novel network training approach for open set image recognition

Figure 2 for A novel network training approach for open set image recognition

Figure 3 for A novel network training approach for open set image recognition

Figure 4 for A novel network training approach for open set image recognition

Abstract:Convolutional Neural Networks (CNNs) are commonly designed for closed set arrangements, where test instances only belong to some "Known Known" (KK) classes used in training. As such, they predict a class label for a test sample based on the distribution of the KK classes. However, when used under the Open Set Recognition (OSR) setup (where an input may belong to an "Unknown Unknown" or UU class), such a network will always classify a test instance as one of the KK classes even if it is from a UU class. As a solution, recently, data augmentation based on Generative Adversarial Networks(GAN) has been used. In this work, we propose a novel approach for mining a "Known UnknownTrainer" or KUT set and design a deep OSR Network (OSRNet) to harness this dataset. The goal isto teach OSRNet the essence of the UUs through KUT set, which is effectively a collection of mined "hard Known Unknown negatives". Once trained, OSRNet can detect the UUs while maintaining high classification accuracy on KKs. We evaluate OSRNet on six benchmark datasets and demonstrate it outperforms contemporary OSR methods.

Via

Access Paper or Ask Questions