Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Majid Mirmehdi

Automated Radiology Report Generation: A Review of Recent Advances

May 17, 2024

Phillip Sloan, Philip Clatworthy, Edwin Simpson, Majid Mirmehdi

Abstract:Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.

* 24 pages, 8 figures, 6 tables. Submitted to IEEE Reviews in Biomedical Engineering

Via

Access Paper or Ask Questions

ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition

Apr 13, 2024

Otto Brookes, Majid Mirmehdi, Hjalmar Kuhl, Tilo Burghardt

Figure 1 for ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition

Figure 2 for ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition

Figure 3 for ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition

Figure 4 for ChimpVLM: Ethogram-Enhanced Chimpanzee Behaviour Recognition

Abstract:We show that chimpanzee behaviour understanding from camera traps can be enhanced by providing visual architectures with access to an embedding of text descriptions that detail species behaviours. In particular, we present a vision-language model which employs multi-modal decoding of visual features extracted directly from camera trap videos to process query tokens representing behaviours and output class predictions. Query tokens are initialised using a standardised ethogram of chimpanzee behaviour, rather than using random or name-based initialisations. In addition, the effect of initialising query tokens using a masked language model fine-tuned on a text corpus of known behavioural patterns is explored. We evaluate our system on the PanAf500 and PanAf20K datasets and demonstrate the performance benefits of our multi-modal decoding approach and query initialisation strategy on multi-class and multi-label recognition tasks, respectively. Results and ablations corroborate performance improvements. We achieve state-of-the-art performance over vision and vision-language models in top-1 accuracy (+6.34%) on PanAf500 and overall (+1.1%) and tail-class (+2.26%) mean average precision on PanAf20K. We share complete source code and network weights for full reproducibility of results and easy utilisation.

Via

Access Paper or Ask Questions

PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

Jan 31, 2024

Otto Brookes, Majid Mirmehdi, Colleen Stephens, Samuel Angedakin, Katherine Corogenes, Dervla Dowd, Paula Dieguez, Thurston C. Hicks, Sorrel Jones, Kevin Lee(+17 more)

Figure 1 for PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

Figure 2 for PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

Figure 3 for PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

Figure 4 for PanAf20K: A Large Video Dataset for Wild Ape Detection and Behaviour Recognition

Abstract:We present the PanAf20K dataset, the largest and most diverse open-access annotated video dataset of great apes in their natural environment. It comprises more than 7 million frames across ~20,000 camera trap videos of chimpanzees and gorillas collected at 14 field sites in tropical Africa as part of the Pan African Programme: The Cultured Chimpanzee. The footage is accompanied by a rich set of annotations and benchmarks making it suitable for training and testing a variety of challenging and ecologically important computer vision tasks including ape detection and behaviour recognition. Furthering AI analysis of camera trap information is critical given the International Union for Conservation of Nature now lists all species in the great ape family as either Endangered or Critically Endangered. We hope the dataset can form a solid basis for engagement of the AI community to improve performance, efficiency, and result interpretation in order to support assessments of great ape presence, abundance, distribution, and behaviour and thereby aid conservation efforts.

* Accepted at IJCV

Via

Access Paper or Ask Questions

QAFE-Net: Quality Assessment of Facial Expressions with Landmark Heatmaps

Dec 01, 2023

Shuchao Duan, Amirhossein Dadashzadeh, Alan Whone, Majid Mirmehdi

Abstract:Facial expression recognition (FER) methods have made great inroads in categorising moods and feelings in humans. Beyond FER, pain estimation methods assess levels of intensity in pain expressions, however assessing the quality of all facial expressions is of critical value in health-related applications. In this work, we address the quality of five different facial expressions in patients affected by Parkinson's disease. We propose a novel landmark-guided approach, QAFE-Net, that combines temporal landmark heatmaps with RGB data to capture small facial muscle movements that are encoded and mapped to severity scores. The proposed approach is evaluated on a new Parkinson's Disease Facial Expression dataset (PFED5), as well as on the pain estimation benchmark, the UNBC-McMaster Shoulder Pain Expression Archive Database. Our comparative experiments demonstrate that the proposed method outperforms SOTA action quality assessment works on PFED5 and achieves lower mean absolute error than the SOTA pain estimation methods on UNBC-McMaster. Our code and the new PFED5 dataset are available at https://github.com/shuchaoduan/QAFE-Net.

Via

Access Paper or Ask Questions

Centre Stage: Centricity-based Audio-Visual Temporal Action Detection

Nov 28, 2023

Hanyuan Wang, Majid Mirmehdi, Dima Damen, Toby Perrett

Abstract:Previous one-stage action detection approaches have modelled temporal dependencies using only the visual modality. In this paper, we explore different strategies to incorporate the audio modality, using multi-scale cross-attention to fuse the two modalities. We also demonstrate the correlation between the distance from the timestep to the action centre and the accuracy of the predicted boundaries. Thus, we propose a novel network head to estimate the closeness of timesteps to the action centre, which we call the centricity score. This leads to increased confidence for proposals that exhibit more precise boundaries. Our method can be integrated with other one-stage anchor-free architectures and we demonstrate this on three recent baselines on the EPIC-Kitchens-100 action detection benchmark where we achieve state-of-the-art performance. Detailed ablation studies showcase the benefits of fusing audio and our proposed centricity scores. Code and models for our proposed method are publicly available at https://github.com/hanielwang/Audio-Visual-TAD.git

* Accepted to VUA workshop at BMVC 2023

Via

Access Paper or Ask Questions

PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment

Nov 11, 2023

Amirhossein Dadashzadeh, Shuchao Duan, Alan Whone, Majid Mirmehdi

Figure 1 for PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment

Figure 2 for PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment

Figure 3 for PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment

Figure 4 for PECoP: Parameter Efficient Continual Pretraining for Action Quality Assessment

Abstract:The limited availability of labelled data in Action Quality Assessment (AQA), has forced previous works to fine-tune their models pretrained on large-scale domain-general datasets. This common approach results in weak generalisation, particularly when there is a significant domain shift. We propose a novel, parameter efficient, continual pretraining framework, PECoP, to reduce such domain shift via an additional pretraining stage. In PECoP, we introduce 3D-Adapters, inserted into the pretrained model, to learn spatiotemporal, in-domain information via self-supervised learning where only the adapter modules' parameters are updated. We demonstrate PECoP's ability to enhance the performance of recent state-of-the-art methods (MUSDL, CoRe, and TSA) applied to AQA, leading to considerable improvements on benchmark datasets, JIGSAWS ($\uparrow6.0\%$), MTL-AQA ($\uparrow0.99\%$), and FineDiving ($\uparrow2.54\%$). We also present a new Parkinson's Disease dataset, PD4T, of real patients performing four various actions, where we surpass ($\uparrow3.56\%$) the state-of-the-art in comparison. Our code, pretrained models, and the PD4T dataset are available at https://github.com/Plrbear/PECoP.

* Accepted to WACV 2024 (preprint)

Via

Access Paper or Ask Questions

Use Your Head: Improving Long-Tail Video Recognition

Apr 03, 2023

Toby Perrett, Saptarshi Sinha, Tilo Burghardt, Majid Mirmehdi, Dima Damen

Figure 1 for Use Your Head: Improving Long-Tail Video Recognition

Figure 2 for Use Your Head: Improving Long-Tail Video Recognition

Figure 3 for Use Your Head: Improving Long-Tail Video Recognition

Figure 4 for Use Your Head: Improving Long-Tail Video Recognition

Abstract:This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sampling subsets from two datasets: SSv2 and VideoLT. We then propose a method, Long-Tail Mixed Reconstruction, which reduces overfitting to instances from few-shot classes by reconstructing them as weighted combinations of samples from head classes. LMR then employs label mixing to learn robust decision boundaries. It achieves state-of-the-art average class accuracy on EPIC-KITCHENS and the proposed SSv2-LT and VideoLT-LT. Benchmarks and code at: tobyperrett.github.io/lmr

* CVPR 2023

Via

Access Paper or Ask Questions

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

Feb 22, 2023

Chengxi Zeng, Xinyu Yang, David Smithard, Majid Mirmehdi, Alberto M Gambaruto, Tilo Burghardt

Abstract:This paper presents a deep learning framework for medical video segmentation. Convolution neural network (CNN) and transformer-based methods have achieved great milestones in medical image segmentation tasks due to their incredible semantic feature encoding and global information comprehension abilities. However, most existing approaches ignore a salient aspect of medical video data - the temporal dimension. Our proposed framework explicitly extracts features from neighbouring frames across the temporal dimension and incorporates them with a temporal feature blender, which then tokenises the high-level spatio-temporal feature to form a strong global feature encoded via a Swin Transformer. The final segmentation results are produced via a UNet-like encoder-decoder architecture. Our model outperforms other approaches by a significant margin and improves the segmentation benchmarks on the VFSS2022 dataset, achieving a dice coefficient of 0.8986 and 0.8186 for the two datasets tested. Our studies also show the efficacy of the temporal feature blending scheme and cross-dataset transferability of learned capabilities. Code and models are fully available at https://github.com/SimonZeng7108/Video-SwinUNet.

Via

Access Paper or Ask Questions

TranSOP: Transformer-based Multimodal Classification for Stroke Treatment Outcome Prediction

Jan 25, 2023

Zeynel A. Samak, Philip Clatworthy, Majid Mirmehdi

Abstract:Acute ischaemic stroke, caused by an interruption in blood flow to brain tissue, is a leading cause of disability and mortality worldwide. The selection of patients for the most optimal ischaemic stroke treatment is a crucial step for a successful outcome, as the effect of treatment highly depends on the time to treatment. We propose a transformer-based multimodal network (TranSOP) for a classification approach that employs clinical metadata and imaging information, acquired on hospital admission, to predict the functional outcome of stroke treatment based on the modified Rankin Scale (mRS). This includes a fusion module to efficiently combine 3D non-contrast computed tomography (NCCT) features and clinical information. In comparative experiments using unimodal and multimodal data on the MRCLEAN dataset, we achieve a state-of-the-art AUC score of 0.85.

* Accepted at IEEE ISBI 2023, 5 pages

Via

Access Paper or Ask Questions

Triple-stream Deep Metric Learning of Great Ape Behavioural Actions

Jan 06, 2023

Otto Brookes, Majid Mirmehdi, Hjalmar Kühl, Tilo Burghardt

Figure 1 for Triple-stream Deep Metric Learning of Great Ape Behavioural Actions

Figure 2 for Triple-stream Deep Metric Learning of Great Ape Behavioural Actions

Figure 3 for Triple-stream Deep Metric Learning of Great Ape Behavioural Actions

Figure 4 for Triple-stream Deep Metric Learning of Great Ape Behavioural Actions

Abstract:We propose the first metric learning system for the recognition of great ape behavioural actions. Our proposed triple stream embedding architecture works on camera trap videos taken directly in the wild and demonstrates that the utilisation of an explicit DensePose-C chimpanzee body part segmentation stream effectively complements traditional RGB appearance and optical flow streams. We evaluate system variants with different feature fusion techniques and long-tail recognition approaches. Results and ablations show performance improvements of ~12% in top-1 accuracy over previous results achieved on the PanAf-500 dataset containing 180,000 manually annotated frames across nine behavioural actions. Furthermore, we provide a qualitative analysis of our findings and augment the metric learning system with long-tail recognition techniques showing that average per class accuracy -- critical in the domain -- can be improved by ~23% compared to the literature on that dataset. Finally, since our embedding spaces are constructed as metric, we provide first data-driven visualisations of the great ape behavioural action spaces revealing emerging geometry and topology. We hope that the work sparks further interest in this vital application area of computer vision for the benefit of endangered great apes.

Via

Access Paper or Ask Questions