Alert button
Picture for Mehrtash Harandi

Mehrtash Harandi

Alert button

Real-time Neonatal Chest Sound Separation using Deep Learning

Oct 26, 2023
Yang Yi Poh, Ethan Grooby, Kenneth Tan, Lindsay Zhou, Arrabella King, Ashwin Ramanathan, Atul Malhotra, Mehrtash Harandi, Faezeh Marzbanrad

Figure 1 for Real-time Neonatal Chest Sound Separation using Deep Learning
Figure 2 for Real-time Neonatal Chest Sound Separation using Deep Learning
Figure 3 for Real-time Neonatal Chest Sound Separation using Deep Learning
Figure 4 for Real-time Neonatal Chest Sound Separation using Deep Learning

Auscultation for neonates is a simple and non-invasive method of providing diagnosis for cardiovascular and respiratory disease. Such diagnosis often requires high-quality heart and lung sounds to be captured during auscultation. However, in most cases, obtaining such high-quality sounds is non-trivial due to the chest sounds containing a mixture of heart, lung, and noise sounds. As such, additional preprocessing is needed to separate the chest sounds into heart and lung sounds. This paper proposes a novel deep-learning approach to separate such chest sounds into heart and lung sounds. Inspired by the Conv-TasNet model, the proposed model has an encoder, decoder, and mask generator. The encoder consists of a 1D convolution model and the decoder consists of a transposed 1D convolution. The mask generator is constructed using stacked 1D convolutions and transformers. The proposed model outperforms previous methods in terms of objective distortion measures by 2.01 dB to 5.06 dB in the artificial dataset, as well as computation time, with at least a 17-time improvement. Therefore, our proposed model could be a suitable preprocessing step for any phonocardiogram-based health monitoring system.

Viaarxiv icon

Continual Test-time Domain Adaptation via Dynamic Sample Selection

Oct 05, 2023
Yanshuo Wang, Jie Hong, Ali Cheraghian, Shafin Rahman, David Ahmedt-Aristizabal, Lars Petersson, Mehrtash Harandi

Figure 1 for Continual Test-time Domain Adaptation via Dynamic Sample Selection
Figure 2 for Continual Test-time Domain Adaptation via Dynamic Sample Selection
Figure 3 for Continual Test-time Domain Adaptation via Dynamic Sample Selection
Figure 4 for Continual Test-time Domain Adaptation via Dynamic Sample Selection

The objective of Continual Test-time Domain Adaptation (CTDA) is to gradually adapt a pre-trained model to a sequence of target domains without accessing the source data. This paper proposes a Dynamic Sample Selection (DSS) method for CTDA. DSS consists of dynamic thresholding, positive learning, and negative learning processes. Traditionally, models learn from unlabeled unknown environment data and equally rely on all samples' pseudo-labels to update their parameters through self-training. However, noisy predictions exist in these pseudo-labels, so all samples are not equally trustworthy. Therefore, in our method, a dynamic thresholding module is first designed to select suspected low-quality from high-quality samples. The selected low-quality samples are more likely to be wrongly predicted. Therefore, we apply joint positive and negative learning on both high- and low-quality samples to reduce the risk of using wrong information. We conduct extensive experiments that demonstrate the effectiveness of our proposed method for CTDA in the image domain, outperforming the state-of-the-art results. Furthermore, our approach is also evaluated in the 3D point cloud domain, showcasing its versatility and potential for broader applicability.

Viaarxiv icon

Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation

Sep 30, 2023
Minh-Tuan Tran, Trung Le, Xuan-May Le, Mehrtash Harandi, Quan Hung Tran, Dinh Phung

Figure 1 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation
Figure 2 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation
Figure 3 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation
Figure 4 for Unleash Data Generation for Efficient and Effective Data-free Knowledge Distillation

Data-Free Knowledge Distillation (DFKD) has recently made remarkable advancements with its core principle of transferring knowledge from a teacher neural network to a student neural network without requiring access to the original data. Nonetheless, existing approaches encounter a significant challenge when attempting to generate samples from random noise inputs, which inherently lack meaningful information. Consequently, these models struggle to effectively map this noise to the ground-truth sample distribution, resulting in the production of low-quality data and imposing substantial time requirements for training the generator. In this paper, we propose a novel Noisy Layer Generation method (NAYER) which relocates the randomness source from the input to a noisy layer and utilizes the meaningful label-text embedding (LTE) as the input. The significance of LTE lies in its ability to contain substantial meaningful inter-class information, enabling the generation of high-quality samples with only a few training steps. Simultaneously, the noisy layer plays a key role in addressing the issue of diversity in sample generation by preventing the model from overemphasizing the constrained label information. By reinitializing the noisy layer in each iteration, we aim to facilitate the generation of diverse samples while still retaining the method's efficiency, thanks to the ease of learning provided by LTE. Experiments carried out on multiple datasets demonstrate that our NAYER not only outperforms the state-of-the-art methods but also achieves speeds 5 to 15 times faster than previous approaches.

Viaarxiv icon

RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Sep 29, 2023
Tuan Truong, Hoang-Phi Nguyen, Tung Pham, Minh-Tuan Tran, Mehrtash Harandi, Dinh Phung, Trung Le

Figure 1 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
Figure 2 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
Figure 3 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
Figure 4 for RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization

Nowadays, understanding the geometry of the loss landscape shows promise in enhancing a model's generalization ability. In this work, we draw upon prior works that apply geometric principles to optimization and present a novel approach to improve robustness and generalization ability for constrained optimization problems. Indeed, this paper aims to generalize the Sharpness-Aware Minimization (SAM) optimizer to Riemannian manifolds. In doing so, we first extend the concept of sharpness and introduce a novel notion of sharpness on manifolds. To support this notion of sharpness, we present a theoretical analysis characterizing generalization capabilities with respect to manifold sharpness, which demonstrates a tighter bound on the generalization gap, a result not known before. Motivated by this analysis, we introduce our algorithm, Riemannian Sharpness-Aware Minimization (RSAM). To demonstrate RSAM's ability to enhance generalization ability, we evaluate and contrast our algorithm on a broad set of problems, such as image classification and contrastive learning across different datasets, including CIFAR100, CIFAR10, and FGVCAircraft. Our code is publicly available at \url{https://t.ly/RiemannianSAM}.

Viaarxiv icon

Hyperbolic Audio-visual Zero-shot Learning

Aug 24, 2023
Jie Hong, Zeeshan Hayder, Junlin Han, Pengfei Fang, Mehrtash Harandi, Lars Petersson

Figure 1 for Hyperbolic Audio-visual Zero-shot Learning
Figure 2 for Hyperbolic Audio-visual Zero-shot Learning
Figure 3 for Hyperbolic Audio-visual Zero-shot Learning
Figure 4 for Hyperbolic Audio-visual Zero-shot Learning

Audio-visual zero-shot learning aims to classify samples consisting of a pair of corresponding audio and video sequences from classes that are not present during training. An analysis of the audio-visual data reveals a large degree of hyperbolicity, indicating the potential benefit of using a hyperbolic transformation to achieve curvature-aware geometric learning, with the aim of exploring more complex hierarchical data structures for this task. The proposed approach employs a novel loss function that incorporates cross-modality alignment between video and audio features in the hyperbolic space. Additionally, we explore the use of multiple adaptive curvatures for hyperbolic projections. The experimental results on this very challenging task demonstrate that our proposed hyperbolic approach for zero-shot learning outperforms the SOTA method on three datasets: VGGSound-GZSL, UCF-GZSL, and ActivityNet-GZSL achieving a harmonic mean (HM) improvement of around 3.0%, 7.0%, and 5.3%, respectively.

* ICCV 2023 
Viaarxiv icon

L3DMC: Lifelong Learning using Distillation via Mixed-Curvature Space

Aug 01, 2023
Kaushik Roy, Peyman Moghadam, Mehrtash Harandi

Figure 1 for L3DMC: Lifelong Learning using Distillation via Mixed-Curvature Space
Figure 2 for L3DMC: Lifelong Learning using Distillation via Mixed-Curvature Space
Figure 3 for L3DMC: Lifelong Learning using Distillation via Mixed-Curvature Space
Figure 4 for L3DMC: Lifelong Learning using Distillation via Mixed-Curvature Space

The performance of a lifelong learning (L3) model degrades when it is trained on a series of tasks, as the geometrical formation of the embedding space changes while learning novel concepts sequentially. The majority of existing L3 approaches operate on a fixed-curvature (e.g., zero-curvature Euclidean) space that is not necessarily suitable for modeling the complex geometric structure of data. Furthermore, the distillation strategies apply constraints directly on low-dimensional embeddings, discouraging the L3 model from learning new concepts by making the model highly stable. To address the problem, we propose a distillation strategy named L3DMC that operates on mixed-curvature spaces to preserve the already-learned knowledge by modeling and maintaining complex geometrical structures. We propose to embed the projected low dimensional embedding of fixed-curvature spaces (Euclidean and hyperbolic) to higher-dimensional Reproducing Kernel Hilbert Space (RKHS) using a positive-definite kernel function to attain rich representation. Afterward, we optimize the L3 model by minimizing the discrepancies between the new sample representation and the subspace constructed using the old representation in RKHS. L3DMC is capable of adapting new knowledge better without forgetting old knowledge as it combines the representation power of multiple fixed-curvature spaces and is performed on higher-dimensional RKHS. Thorough experiments on three benchmarks demonstrate the effectiveness of our proposed distillation strategy for medical image classification in L3 settings. Our code implementation is publicly available at https://github.com/csiro-robotics/L3DMC.

* MICCAI 2023 (Early Accept) 
Viaarxiv icon

Subspace Distillation for Continual Learning

Aug 01, 2023
Kaushik Roy, Christian Simon, Peyman Moghadam, Mehrtash Harandi

Figure 1 for Subspace Distillation for Continual Learning
Figure 2 for Subspace Distillation for Continual Learning
Figure 3 for Subspace Distillation for Continual Learning
Figure 4 for Subspace Distillation for Continual Learning

An ultimate objective in continual learning is to preserve knowledge learned in preceding tasks while learning new tasks. To mitigate forgetting prior knowledge, we propose a novel knowledge distillation technique that takes into the account the manifold structure of the latent/output space of a neural network in learning novel tasks. To achieve this, we propose to approximate the data manifold up-to its first order, hence benefiting from linear subspaces to model the structure and maintain the knowledge of a neural network while learning novel concepts. We demonstrate that the modeling with subspaces provides several intriguing properties, including robustness to noise and therefore effective for mitigating Catastrophic Forgetting in continual learning. We also discuss and show how our proposed method can be adopted to address both classification and segmentation problems. Empirically, we observe that our proposed method outperforms various continual learning methods on several challenging datasets including Pascal VOC, and Tiny-Imagenet. Furthermore, we show how the proposed method can be seamlessly combined with existing learning approaches to improve their performances. The codes of this article will be available at https://github.com/csiro-robotics/SDCL.

* Neural Networks (submitted May 2022, accepted July 2023) 
Viaarxiv icon

EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos

Jul 21, 2023
Ruyi Zha, Xuelian Cheng, Hongdong Li, Mehrtash Harandi, Zongyuan Ge

Figure 1 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos
Figure 2 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos
Figure 3 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos
Figure 4 for EndoSurf: Neural Surface Reconstruction of Deformable Tissues with Stereo Endoscope Videos

Reconstructing soft tissues from stereo endoscope videos is an essential prerequisite for many medical applications. Previous methods struggle to produce high-quality geometry and appearance due to their inadequate representations of 3D scenes. To address this issue, we propose a novel neural-field-based method, called EndoSurf, which effectively learns to represent a deforming surface from an RGBD sequence. In EndoSurf, we model surface dynamics, shape, and texture with three neural fields. First, 3D points are transformed from the observed space to the canonical space using the deformation field. The signed distance function (SDF) field and radiance field then predict their SDFs and colors, respectively, with which RGBD images can be synthesized via differentiable volume rendering. We constrain the learned shape by tailoring multiple regularization strategies and disentangling geometry and appearance. Experiments on public endoscope datasets demonstrate that EndoSurf significantly outperforms existing solutions, particularly in reconstructing high-fidelity shapes. Code is available at https://github.com/Ruyi-Zha/endosurf.git.

* MICCAI 2023 (Early Accept); Ruyi Zha and Xuelian Cheng made equal contributions. Corresponding author: Ruyi Zha (ruyi.zha@gmail.com) 
Viaarxiv icon

Contrastive Learning MRI Reconstruction

Jun 01, 2023
Mevan Ekanayake, Zhifeng Chen, Gary Egan, Mehrtash Harandi, Zhaolin Chen

Figure 1 for Contrastive Learning MRI Reconstruction
Figure 2 for Contrastive Learning MRI Reconstruction
Figure 3 for Contrastive Learning MRI Reconstruction
Figure 4 for Contrastive Learning MRI Reconstruction

Purpose: We propose a novel contrastive learning latent space representation for MRI datasets with partially acquired scans. We show that this latent space can be utilized for accelerated MR image reconstruction. Theory and Methods: Our novel framework, referred to as COLADA (stands for Contrastive Learning for highly accelerated MR image reconstruction), maximizes the mutual information between differently accelerated images of an MRI scan by using self-supervised contrastive learning. In other words, it attempts to "pull" the latent representations of the same scan together and "push" the latent representations of other scans away. The generated MRI latent space is subsequently utilized for MR image reconstruction and the performance was assessed in comparison to several baseline deep learning reconstruction methods. Furthermore, the quality of the proposed latent space representation was analyzed using Alignment and Uniformity. Results: COLADA comprehensively outperformed other reconstruction methods with robustness to variations in undersampling patterns, pathological abnormalities, and noise in k-space during inference. COLADA proved the high quality of reconstruction on unseen data with minimal fine-tuning. The analysis of representation quality suggests that the contrastive features produced by COLADA are optimally distributed in latent space. Conclusion: To the best of our knowledge, this is the first attempt to utilize contrastive learning on differently accelerated images for MR image reconstruction. The proposed latent space representation has practical usage due to a large number of existing partially sampled datasets. This implies the possibility of exploring self-supervised contrastive learning further to enhance the latent space of MRI for image reconstruction.

Viaarxiv icon

Hyperbolic Geometry in Computer Vision: A Survey

Apr 21, 2023
Pengfei Fang, Mehrtash Harandi, Trung Le, Dinh Phung

Figure 1 for Hyperbolic Geometry in Computer Vision: A Survey
Figure 2 for Hyperbolic Geometry in Computer Vision: A Survey
Figure 3 for Hyperbolic Geometry in Computer Vision: A Survey
Figure 4 for Hyperbolic Geometry in Computer Vision: A Survey

Hyperbolic geometry, a Riemannian manifold endowed with constant sectional negative curvature, has been considered an alternative embedding space in many learning scenarios, \eg, natural language processing, graph learning, \etc, as a result of its intriguing property of encoding the data's hierarchical structure (like irregular graph or tree-likeness data). Recent studies prove that such data hierarchy also exists in the visual dataset, and investigate the successful practice of hyperbolic geometry in the computer vision (CV) regime, ranging from the classical image classification to advanced model adaptation learning. This paper presents the first and most up-to-date literature review of hyperbolic spaces for CV applications. To this end, we first introduce the background of hyperbolic geometry, followed by a comprehensive investigation of algorithms, with geometric prior of hyperbolic space, in the context of visual applications. We also conclude this manuscript and identify possible future directions.

* First survey paper for the hyperbolic geometry in CV applications 
Viaarxiv icon