Stable Diffusion (SD) has gained a lot of attention in recent years in the field of Generative AI thus helping in synthesizing medical imaging data with distinct features. The aim is to contribute to the ongoing effort focused on overcoming the limitations of data scarcity and improving the capabilities of ML algorithms for cardiovascular image processing. Therefore, in this study, the possibility of generating synthetic cardiac CTA images was explored by fine-tuning stable diffusion models based on user defined text prompts, using only limited number of CTA images as input. A comprehensive evaluation of the synthetic data was conducted by incorporating both quantitative analysis and qualitative assessment, where a clinician assessed the quality of the generated data. It has been shown that Cardiac CTA images can be successfully generated using using Text to Image (T2I) stable diffusion model. The results demonstrate that the tuned T2I CTA diffusion model was able to generate images with features that are typically unique to acute type B aortic dissection (TBAD) medical conditions.
We introduce Meta-Album, an image classification meta-dataset designed to facilitate few-shot learning, transfer learning, meta-learning, among other tasks. It includes 40 open datasets, each having at least 20 classes with 40 examples per class, with verified licences. They stem from diverse domains, such as ecology (fauna and flora), manufacturing (textures, vehicles), human actions, and optical character recognition, featuring various image scales (microscopic, human scales, remote sensing). All datasets are preprocessed, annotated, and formatted uniformly, and come in 3 versions (Micro $\subset$ Mini $\subset$ Extended) to match users' computational resources. We showcase the utility of the first 30 datasets on few-shot learning problems. The other 10 will be released shortly after. Meta-Album is already more diverse and larger (in number of datasets) than similar efforts, and we are committed to keep enlarging it via a series of competitions. As competitions terminate, their test data are released, thus creating a rolling benchmark, available through OpenML.org. Our website https://meta-album.github.io/ contains the source code of challenge winning methods, baseline methods, data loaders, and instructions for contributing either new datasets or algorithms to our expandable meta-dataset.
We present the design and baseline results for a new challenge in the ChaLearn meta-learning series, accepted at NeurIPS'22, focusing on "cross-domain" meta-learning. Meta-learning aims to leverage experience gained from previous tasks to solve new tasks efficiently (i.e., with better performance, little training data, and/or modest computational resources). While previous challenges in the series focused on within-domain few-shot learning problems, with the aim of learning efficiently N-way k-shot tasks (i.e., N class classification problems with k training examples), this competition challenges the participants to solve "any-way" and "any-shot" problems drawn from various domains (healthcare, ecology, biology, manufacturing, and others), chosen for their humanitarian and societal impact. To that end, we created Meta-Album, a meta-dataset of 40 image classification datasets from 10 domains, from which we carve out tasks with any number of "ways" (within the range 2-20) and any number of "shots" (within the range 1-20). The competition is with code submission, fully blind-tested on the CodaLab challenge platform. The code of the winners will be open-sourced, enabling the deployment of automated machine learning solutions for few-shot image classification across several domains.
Intelligent video-surveillance (IVS) is currently an active research field in computer vision and machine learning and provides useful tools for surveillance operators and forensic video investigators. Person re-identification (PReID) is one of the most critical problems in IVS, and it consists of recognizing whether or not an individual has already been observed over a camera in a network. Solutions to PReID have myriad applications including retrieval of video-sequences showing an individual of interest or even pedestrian tracking over multiple camera views. Different techniques have been proposed to increase the performance of PReID in the literature, and more recently researchers utilized deep neural networks (DNNs) given their compelling performance on similar vision problems and fast execution at test time. Given the importance and wide range of applications of re-identification solutions, our objective herein is to discuss the work carried out in the area and come up with a survey of state-of-the-art DNN models being used for this task. We present descriptions of each model along with their evaluation on a set of benchmark datasets. Finally, we show a detailed comparison among these models, which are followed by some discussions on their limitations that can work as guidelines for future research.
In 2015 we began a sub-challenge at the EndoVis workshop at MICCAI in Munich using endoscope images of ex-vivo tissue with automatically generated annotations from robot forward kinematics and instrument CAD models. However, the limited background variation and simple motion rendered the dataset uninformative in learning about which techniques would be suitable for segmentation in real surgery. In 2017, at the same workshop in Quebec we introduced the robotic instrument segmentation dataset with 10 teams participating in the challenge to perform binary, articulating parts and type segmentation of da Vinci instruments. This challenge included realistic instrument motion and more complex porcine tissue as background and was widely addressed with modifications on U-Nets and other popular CNN architectures. In 2018 we added to the complexity by introducing a set of anatomical objects and medical devices to the segmented classes. To avoid over-complicating the challenge, we continued with porcine data which is dramatically simpler than human tissue due to the lack of fatty tissue occluding many organs.
In the current era of neural networks and big data, higher dimensional data is processed for automation of different application areas. Graphs represent a complex data organization in which dependencies between more than one object or activity occur. Due to the high dimensionality, this data creates challenges for machine learning algorithms. Graph convolutional networks were introduced to utilize the convolutional models concepts that shows good results. In this context, we enhanced two of the existing Graph convolutional network models by proposing four enhancements. These changes includes: hyper parameters optimization, convex combination of activation functions, topological information enrichment through clustering coefficients measure, and structural redesign of the network through addition of dense layers. We present extensive results on four state-of-art benchmark datasets. The performance is notable not only in terms of lesser computational cost compared to competitors, but also achieved competitive results for three of the datasets and state-of-the-art for the fourth dataset.
Emotions play a crucial role in human interaction, health care and security investigations and monitoring. Automatic emotion recognition (AER) using electroencephalogram (EEG) signals is an effective method for decoding the real emotions, which are independent of body gestures, but it is a challenging problem. Several automatic emotion recognition systems have been proposed, which are based on traditional hand-engineered approaches and their performances are very poor. Motivated by the outstanding performance of deep learning (DL) in many recognition tasks, we introduce an AER system (Deep-AER) based on EEG brain signals using DL. A DL model involves a large number of learnable parameters, and its training needs a large dataset of EEG signals, which is difficult to acquire for AER problem. To overcome this problem, we proposed a lightweight pyramidal one-dimensional convolutional neural network (LP-1D-CNN) model, which involves a small number of learnable parameters. Using LP-1D-CNN, we build a two level ensemble model. In the first level of the ensemble, each channel is scanned incrementally by LP-1D-CNN to generate predictions, which are fused using majority vote. The second level of the ensemble combines the predictions of all channels of an EEG signal using majority vote for detecting the emotion state. We validated the effectiveness and robustness of Deep-AER using DEAP, a benchmark dataset for emotion recognition research. The results indicate that FRONT plays dominant role in AER and over this region, Deep-AER achieved the accuracies of 98.43% and 97.65% for two AER problems, i.e., high valence vs low valence (HV vs LV) and high arousal vs low arousal (HA vs LA), respectively. The comparison reveals that Deep-AER outperforms the state-of-the-art systems with large margin. The Deep-AER system will be helpful in monitoring for health care and security investigations.
Autonomous robotics and artificial intelligence techniques can be used to support human personnel in the event of critical incidents. These incidents can pose great danger to human life. Some examples of such assistance include: multi-robot surveying of the scene; collection of sensor data and scene imagery, real-time risk assessment and analysis; object identification and anomaly detection; and retrieval of relevant supporting documentation such as standard operating procedures (SOPs). These incidents, although often rare, can involve chemical, biological, radiological/nuclear or explosive (CBRNE) substances and can be of high consequence. Real-world training and deployment of these systems can be costly and sometimes not feasible. For this reason, we have developed a realistic 3D model of a CBRNE scenario to act as a testbed for an initial set of assisting AI tools that we have developed.
Intelligent video-surveillance is currently an active research field in computer vision and machine learning techniques. It provides useful tools for surveillance operators and forensic video investigators. Person re-identification (PReID) is one among these tools. It consists of recognizing whether an individual has already been observed over a camera in a network or not. This tool can also be employed in various possible applications such as off-line retrieval of all the video-sequences showing an individual of interest whose image is given a query, and online pedestrian tracking over multiple camera views. To this aim, many techniques have been proposed to increase the performance of PReID. Among the systems, many researchers utilized deep neural networks (DNNs) because of their better performance and fast execution at test time. Our objective is to provide for future researchers the work being done on PReID to date. Therefore, we summarized state-of-the-art DNN models being used for this task. A brief description of each model along with their evaluation on a set of benchmark datasets is given. Finally, a detailed comparison is provided among these models followed by some limitations that can work as guidelines for future research.