Advancements in deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs), have ushered in an era of generating highly realistic images. While this technological progress has captured significant interest, it has also raised concerns about the potential difficulty in distinguishing real images from their synthetic counterparts. This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs). We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images. The pivotal conceptual shift in our methodology revolves around reframing binary classification as an image captioning task, leveraging the distinctive capabilities of cutting-edge VLM, notably bootstrapping language image pre-training (BLIP2). Rigorous and comprehensive experiments are conducted to validate the effectiveness of our proposed approach, particularly in detecting unseen diffusion-generated images from unknown diffusion-based generative models during training, showcasing robustness to noise, and demonstrating generalization capabilities to GANs. The obtained results showcase an impressive average accuracy of 93.41% in synthetic image detection on unseen generation models. The code and models associated with this research can be publicly accessed at https://github.com/Mamadou-Keita/VLM-DETECT.
In recent years, the emergence of models capable of generating images from text has attracted considerable interest, offering the possibility of creating realistic images from text descriptions. Yet these advances have also raised concerns about the potential misuse of these images, including the creation of misleading content such as fake news and propaganda. This study investigates the effectiveness of using advanced vision-language models (VLMs) for synthetic image identification. Specifically, the focus is on tuning state-of-the-art image captioning models for synthetic image detection. By harnessing the robust understanding capabilities of large VLMs, the aim is to distinguish authentic images from synthetic images produced by diffusion-based models. This study contributes to the advancement of synthetic image detection by exploiting the capabilities of visual language models such as BLIP-2 and ViTGPT2. By tailoring image captioning models, we address the challenges associated with the potential misuse of synthetic images in real-world applications. Results described in this paper highlight the promising role of VLMs in the field of synthetic image detection, outperforming conventional image-based detection techniques. Code and models can be found at https://github.com/Mamadou-Keita/VLM-DETECT.
Since the emergence of Covid-19 in late 2019, medical image analysis using artificial intelligence (AI) has emerged as a crucial research area, particularly with the utility of CT-scan imaging for disease diagnosis. This paper contributes to the 4th COV19D competition, focusing on Covid-19 Detection and Covid-19 Domain Adaptation Challenges. Our approach centers on lung segmentation and Covid-19 infection segmentation employing the recent CNN-based segmentation architecture PDAtt-Unet, which simultaneously segments lung regions and infections. Departing from traditional methods, we concatenate the input slice (grayscale) with segmented lung and infection, generating three input channels akin to color channels. Additionally, we employ three 3D CNN backbones Customized Hybrid-DeCoVNet, along with pretrained 3D-Resnet-18 and 3D-Resnet-50 models to train Covid-19 recognition for both challenges. Furthermore, we explore ensemble approaches and testing augmentation to enhance performance. Comparison with baseline results underscores the substantial efficiency of our approach, with a significant margin in terms of F1-score (14 %). This study advances the field by presenting a comprehensive methodology for accurate Covid-19 detection and adaptation, leveraging cutting-edge AI techniques in medical image analysis.
Automatic pain intensity estimation plays a pivotal role in healthcare and medical fields. While many methods have been developed to gauge human pain using behavioral or physiological indicators, facial expressions have emerged as a prominent tool for this purpose. Nevertheless, the dependence on labeled data for these techniques often renders them expensive and time-consuming. To tackle this, we introduce the Adaptive Hierarchical Spatio-temporal Dynamic Image (AHDI) technique. AHDI encodes spatiotemporal changes in facial videos into a singular RGB image, permitting the application of simpler 2D deep models for video representation. Within this framework, we employ a residual network to derive generalized facial representations. These representations are optimized for two tasks: estimating pain intensity and differentiating between genuine and simulated pain expressions. For the former, a regression model is trained using the extracted representations, while for the latter, a binary classifier identifies genuine versus feigned pain displays. Testing our method on two widely-used pain datasets, we observed encouraging results for both tasks. On the UNBC database, we achieved an MSE of 0.27 outperforming the SOTA which had an MSE of 0.40. On the BioVid dataset, our model achieved an accuracy of 89.76%, which is an improvement of 5.37% over the SOTA accuracy. Most notably, for distinguishing genuine from simulated pain, our accuracy stands at 94.03%, marking a substantial improvement of 8.98%. Our methodology not only minimizes the need for extensive labeled data but also augments the precision of pain evaluations, facilitating superior pain management.
In the recent years, hyperspectral imaging (HSI) has gained considerably popularity among computer vision researchers for its potential in solving remote sensing problems, especially in agriculture field. However, HSI classification is a complex task due to the high redundancy of spectral bands, limited training samples, and non-linear relationship between spatial position and spectral bands. Fortunately, deep learning techniques have shown promising results in HSI analysis. This literature review explores recent applications of deep learning approaches such as Autoencoders, Convolutional Neural Networks (1D, 2D, and 3D), Recurrent Neural Networks, Deep Belief Networks, and Generative Adversarial Networks in agriculture. The performance of these approaches has been evaluated and discussed on well-known land cover datasets including Indian Pines, Salinas Valley, and Pavia University.
In the last three years, the world has been facing a global crisis caused by Covid-19 pandemic. Medical imaging has been playing a crucial role in the fighting against this disease and saving the human lives. Indeed, CT-scans has proved their efficiency in diagnosing, detecting, and following-up the Covid-19 infection. In this paper, we propose a new Transformer-CNN based approach for Covid-19 infection segmentation from the CT slices. The proposed D-TrAttUnet architecture has an Encoder-Decoder structure, where compound Transformer-CNN encoder and Dual-Decoders are proposed. The Transformer-CNN encoder is built using Transformer layers, UpResBlocks, ResBlocks and max-pooling layers. The Dual-Decoder consists of two identical CNN decoders with attention gates. The two decoders are used to segment the infection and the lung regions simultaneously and the losses of the two tasks are joined. The proposed D-TrAttUnet architecture is evaluated for both Binary and Multi-classes Covid-19 infection segmentation. The experimental results prove the efficiency of the proposed approach to deal with the complexity of Covid-19 segmentation task from limited data. Furthermore, D-TrAttUnet architecture outperforms three baseline CNN segmentation architectures (Unet, AttUnet and Unet++) and three state-of-the-art architectures (AnamNet, SCOATNet and CopleNet), in both Binary and Mutli-classes segmentation tasks.
Since the appearance of Covid-19 in late 2019, Covid-19 has become an active research topic for the artificial intelligence (AI) community. One of the most interesting AI topics is Covid-19 analysis of medical imaging. CT-scan imaging is the most informative tool about this disease. This work is part of the 3nd COV19D competition for Covid-19 Severity Prediction. In order to deal with the big gap between the validation and test results that were shown in the previous version of this competition, we proposed to combine the prediction of 2D and 3D CNN predictions. For the 2D CNN approach, we propose 2B-InceptResnet architecture which consists of two paths for segmented lungs and infection of all slices of the input CT-scan, respectively. Each path consists of ConvLayer and Inception-ResNet pretrained model on ImageNet. For the 3D CNN approach, we propose hybrid-DeCoVNet architecture which consists of four blocks: Stem, four 3D-ResNet layers, Classification Head and Decision layer. Our proposed approaches outperformed the baseline approach in the validation data of the 3nd COV19D competition for Covid-19 Severity Prediction by 36%.
Since the appearance of Covid-19 in late 2019, Covid-19 has become an active research topic for the artificial intelligence (AI) community. One of the most interesting AI topics is Covid-19 analysis of medical imaging. CT-scan imaging is the most informative tool about this disease. This work is part of the 2nd COV19D competition, where two challenges are set: Covid-19 Detection and Covid-19 Severity Detection from the CT-scans. For Covid-19 detection from CT-scans, we proposed an ensemble of 2D Convolution blocks with Densenet-161 models. Here, each 2D convolutional block with Densenet-161 architecture is trained separately and in testing phase, the ensemble model is based on the average of their probabilities. On the other hand, we proposed an ensemble of Convolutional Layers with Inception models for Covid-19 severity detection. In addition to the Convolutional Layers, three Inception variants were used, namely Inception-v3, Inception-v4 and Inception-Resnet. Our proposed approaches outperformed the baseline approach in the validation data of the 2nd COV19D competition by 11% and 16% for Covid-19 detection and Covid-19 severity detection, respectively.
Automatic kinship verification from facial images is an emerging research topic in machine learning community. In this paper, we proposed an effective facial features extraction model based on multi-view deep features. Thus, we used four pre-trained deep learning models using eight features layers (FC6 and FC7 layers of each VGG-F, VGG-M, VGG-S and VGG-Face models) to train the proposed Multilinear Side-Information based Discriminant Analysis integrating Within Class Covariance Normalization (MSIDA+WCCN) method. Furthermore, we show that how can metric learning methods based on WCCN method integration improves the Simple Scoring Cosine similarity (SSC) method. We refer that we used the SSC method in RFIW'20 competition using the eight deep features concatenation. Thus, the integration of WCCN in the metric learning methods decreases the intra-class variations effect introduced by the deep features weights. We evaluate our proposed method on two kinship benchmarks namely KinFaceW-I and KinFaceW-II databases using four Parent-Child relations (Father-Son, Father-Daughter, Mother-Son and Mother-Daughter). Thus, the proposed MSIDA+WCCN method improves the SSC method with 12.80% and 14.65% on KinFaceW-I and KinFaceW-II databases, respectively. The results obtained are positively compared with some modern methods, including those that rely on deep learning.
Human face aging is irreversible process causing changes in human face characteristics such us hair whitening, muscles drop and wrinkles. Due to the importance of human face aging in biometrics systems, age estimation became an attractive area for researchers. This paper presents a novel method to estimate the age from face images, using binarized statistical image features (BSIF) and local binary patterns (LBP)histograms as features performed by support vector regression (SVR) and kernel ridge regression (KRR). We applied our method on FG-NET and PAL datasets. Our proposed method has shown superiority to that of the state-of-the-art methods when using the whole PAL database.