Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Image": models, code, and papers

Dual flow fusion model for concrete surface crack segmentation

May 16, 2023
Yuwei Duan

Figure 1 for Dual flow fusion model for concrete surface crack segmentation

Figure 2 for Dual flow fusion model for concrete surface crack segmentation

Figure 3 for Dual flow fusion model for concrete surface crack segmentation

Figure 4 for Dual flow fusion model for concrete surface crack segmentation

The existence of cracks and other damages pose a significant threat to the safe operation of transportation infrastructure. Traditional manual detection and ultrasound equipment testing consume a lot of time and resources. With the development of deep learning technology, many deep learning models have been widely applied to practical visual segmentation tasks. The detection method based on deep learning models has the advantages of high detection accuracy, fast detection speed, and simple operation. However, deep learning-based crack segmentation models are sensitive to background noise, have rough edges, and lack robustness. Therefore, this paper proposes a crack segmentation model based on the fusion of dual streams. The image is inputted simultaneously into two designed processing streams to independently extract long-distance dependence and local detail features. The adaptive prediction is achieved through the dual-headed mechanism. Meanwhile, a novel interaction fusion mechanism is proposed to guide the complementary of different feature layers to achieve crack location and recognition in complex backgrounds. Finally, an edge optimization method is proposed to improve the accuracy of segmentation. Experiments show that the F1 value of segmentation results on the DeepCrack[1] public dataset is 93.7% and the IOU value is 86.6%. The F1 value of segmentation results on the CRACK500[2] dataset is 78.1%, and the IOU value is 66.0%.

Via

Access Paper or Ask Questions

Osteosarcoma Tumor Detection using Transfer Learning Models

May 16, 2023
Raisa Fairooz Meem, Khandaker Tabin Hasan

Figure 1 for Osteosarcoma Tumor Detection using Transfer Learning Models

Figure 2 for Osteosarcoma Tumor Detection using Transfer Learning Models

Figure 3 for Osteosarcoma Tumor Detection using Transfer Learning Models

Figure 4 for Osteosarcoma Tumor Detection using Transfer Learning Models

The field of clinical image analysis has been applying transfer learning models increasingly due to their less computational complexity, better accuracy etc. These are pre-trained models that don't require to be trained from scratch which eliminates the necessity of large datasets. Transfer learning models are mostly used for the analysis of brain, breast, or lung images but other sectors such as bone marrow cell detection or bone cancer detection can also benefit from using transfer learning models, especially considering the lack of available large datasets for these tasks. This paper studies the performance of several transfer learning models for osteosarcoma tumour detection. Osteosarcoma is a type of bone cancer mostly found in the cells of the long bones of the body. The dataset consists of H&E stained images divided into 4 categories- Viable Tumor, Non-viable Tumor, Non-Tumor and Viable Non-viable. Both datasets were randomly divided into train and test sets following an 80-20 ratio. 80% was used for training and 20\% for test. 4 models are considered for comparison- EfficientNetB7, InceptionResNetV2, NasNetLarge and ResNet50. All these models are pre-trained on ImageNet. According to the result, InceptionResNetV2 achieved the highest accuracy (93.29%), followed by NasNetLarge (90.91%), ResNet50 (89.83%) and EfficientNetB7 (62.77%). It also had the highest precision (0.8658) and recall (0.8658) values among the 4 models.

Via

Access Paper or Ask Questions

A Chain Rule for the Expected Suprema of Bernoulli Processes

Apr 27, 2023
Yifeng Chu, Maxim Raginsky

We obtain an upper bound on the expected supremum of a Bernoulli process indexed by the image of an index set under a uniformly Lipschitz function class in terms of properties of the index set and the function class, extending an earlier result of Maurer for Gaussian processes. The proof makes essential use of recent results of Bednorz and Latala on the boundedness of Bernoulli processes.

* 14 pages

Via

Access Paper or Ask Questions

STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

Feb 02, 2023
Yupeng Zheng, Chengliang Zhong, Pengfei Li, Huan-ang Gao, Yuhang Zheng, Bu Jin, Ling Wang, Hao Zhao, Guyue Zhou, Qichao Zhang, Dongbin Zhao

Figure 1 for STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

Figure 2 for STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

Figure 3 for STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

Figure 4 for STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capabilities of self-driving vehicles. However, it intrinsically relies upon the photometric consistency assumption, which hardly holds during nighttime. Although various supervised nighttime image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose the first method that jointly learns a nighttime image enhancer and a depth estimator, without using ground truth for either task. Our method tightly entangles two self-supervised tasks using a newly proposed uncertain pixel masking strategy. This strategy originates from the observation that nighttime images not only suffer from underexposed regions but also from overexposed regions. By fitting a bridge-shaped curve to the illumination map distribution, both regions are suppressed and two tasks are bridged naturally. We benchmark the method on two established datasets: nuScenes and RobotCar and demonstrate state-of-the-art performance on both of them. Detailed ablations also reveal the mechanism of our proposal. Last but not least, to mitigate the problem of sparse ground truth of existing datasets, we provide a new photo-realistically enhanced nighttime dataset based upon CARLA. It brings meaningful new challenges to the community. Codes, data, and models are available at https://github.com/ucaszyp/STEPS.

* Accepted by ICRA 2023, Code: https://github.com/ucaszyp/STEPS

Via

Access Paper or Ask Questions

Target-Free Text-guided Image Manipulation

Dec 01, 2022
Wan-Cyuan Fan, Cheng-Fu Yang, Chiao-An Yang, Yu-Chiang Frank Wang

Figure 1 for Target-Free Text-guided Image Manipulation

Figure 2 for Target-Free Text-guided Image Manipulation

Figure 3 for Target-Free Text-guided Image Manipulation

Figure 4 for Target-Free Text-guided Image Manipulation

We tackle the problem of target-free text-guided image manipulation, which requires one to modify the input reference image based on the given text instruction, while no ground truth target image is observed during training. To address this challenging task, we propose a Cyclic-Manipulation GAN (cManiGAN) in this paper, which is able to realize where and how to edit the image regions of interest. Specifically, the image editor in cManiGAN learns to identify and complete the input image, while cross-modal interpreter and reasoner are deployed to verify the semantic correctness of the output image based on the input instruction. While the former utilizes factual/counterfactual description learning for authenticating the image semantics, the latter predicts the "undo" instruction and provides pixel-level supervision for the training of cManiGAN. With such operational cycle-consistency, our cManiGAN can be trained in the above weakly supervised setting. We conduct extensive experiments on the datasets of CLEVR and COCO, and the effectiveness and generalizability of our proposed method can be successfully verified. Project page: https://sites.google.com/view/wancyuanfan/projects/cmanigan.

* AAAI 2023

Via

Access Paper or Ask Questions

Improving Performance of Private Federated Models in Medical Image Analysis

Apr 11, 2023
Xiangjian Hou, Sarit Khirirat, Mohammad Yaqub, Samuel Horvath

Figure 1 for Improving Performance of Private Federated Models in Medical Image Analysis

Figure 2 for Improving Performance of Private Federated Models in Medical Image Analysis

Figure 3 for Improving Performance of Private Federated Models in Medical Image Analysis

Figure 4 for Improving Performance of Private Federated Models in Medical Image Analysis

Federated learning (FL) is a distributed machine learning (ML) approach that allows data to be trained without being centralized. This approach is particularly beneficial for medical applications because it addresses some key challenges associated with medical data, such as privacy, security, and data ownership. On top of that, FL can improve the quality of ML models used in medical applications. Medical data is often diverse and can vary significantly depending on the patient population, making it challenging to develop ML models that are accurate and generalizable. FL allows medical data to be used from multiple sources, which can help to improve the quality and generalizability of ML models. Differential privacy (DP) is a go-to algorithmic tool to make this process secure and private. In this work, we show that the model performance can be further improved by employing local steps, a popular approach to improving the communication efficiency of FL, and tuning the number of communication rounds. Concretely, given the privacy budget, we show an optimal number of local steps and communications rounds. We provide theoretical motivations further corroborated with experimental evaluations on real-world medical imaging tasks.

Via

Access Paper or Ask Questions

Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database

Apr 25, 2023
Diego Pasmino, Carlos Aravena, Juan Tapia, Christoph Busch

Figure 1 for Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database

Figure 2 for Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database

Figure 3 for Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database

Figure 4 for Flickr-PAD: New Face High-Resolution Presentation Attack Detection Database

Nowadays, Presentation Attack Detection is a very active research area. Several databases are constituted in the state-of-the-art using images extracted from videos. One of the main problems identified is that many databases present a low-quality, small image size and do not represent an operational scenario in a real remote biometric system. Currently, these images are captured from smartphones with high-quality and bigger resolutions. In order to increase the diversity of image quality, this work presents a new PAD database based on open-access Flickr images called: "Flickr-PAD". Our new hand-made database shows high-quality printed and screen scenarios. This will help researchers to compare new approaches to existing algorithms on a wider database. This database will be available for other researchers. A leave-one-out protocol was used to train and evaluate three PAD models based on MobileNet-V3 (small and large) and EfficientNet-B0. The best result was reached with MobileNet-V3 large with BPCER10 of 7.08% and BPCER20 of 11.15%.

Via

Access Paper or Ask Questions

Gradient Domain Weighted Guided Image Filtering

Nov 30, 2022
Bo Wang

Figure 1 for Gradient Domain Weighted Guided Image Filtering

Figure 2 for Gradient Domain Weighted Guided Image Filtering

Figure 3 for Gradient Domain Weighted Guided Image Filtering

Figure 4 for Gradient Domain Weighted Guided Image Filtering

As an excellent local filter, guided image filters are subject to halo artifacts. In this paper, the algorithm uses gradient information to accurately determine the edge of the image, and uses the weighted information to further accurately distinguish the flat area and edge area of the image. As a result, the edges of the image are sharper and the level of blur in flat areas is reduced, avoiding halo artifacts caused by excessive blurring near edges. Experiments show that the proposed algorithm can better suppress halo artifacts at the edges. The proposed algorithm has good performance in both image denoising and image detail enhancement.

Via

Access Paper or Ask Questions

Image To Tree with Recursive Prompting

Jan 01, 2023
James Batten, Matthew Sinclair, Ben Glocker, Michiel Schaap

Figure 1 for Image To Tree with Recursive Prompting

Figure 2 for Image To Tree with Recursive Prompting

Figure 3 for Image To Tree with Recursive Prompting

Figure 4 for Image To Tree with Recursive Prompting

Extracting complex structures from grid-based data is a common key step in automated medical image analysis. The conventional solution to recovering tree-structured geometries typically involves computing the minimal cost path through intermediate representations derived from segmentation masks. However, this methodology has significant limitations in the context of projective imaging of tree-structured 3D anatomical data such as coronary arteries, since there are often overlapping branches in the 2D projection. In this work, we propose a novel approach to predicting tree connectivity structure which reformulates the task as an optimization problem over individual steps of a recursive process. We design and train a two-stage model which leverages the UNet and Transformer architectures and introduces an image-based prompting technique. Our proposed method achieves compelling results on a pair of synthetic datasets, and outperforms a shortest-path baseline.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions

Guiding Text-to-Image Diffusion Model Towards Grounded Generation

Jan 12, 2023
Ziyi Li, Qinye Zhou, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie

The goal of this paper is to augment a pre-trained text-to-image diffusion model with the ability of open-vocabulary objects grounding, i.e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt. We make the following contributions: (i) we insert a grounding module into the existing diffusion model, that can be trained to align the visual and textual embedding space of the diffusion model with only a small number of object categories; (ii) we propose an automatic pipeline for constructing a dataset, that consists of {image, segmentation mask, text prompt} triplets, to train the proposed grounding module; (iii) we evaluate the performance of open-vocabulary grounding on images generated from the text-to-image diffusion model and show that the module can well segment the objects of categories beyond seen ones at training time; (iv) we adopt the guided diffusion model to build a synthetic semantic segmentation dataset, and show that training a standard segmentation model on such dataset demonstrates competitive performance on zero-shot segmentation(ZS3) benchmark, which opens up new opportunities for adopting the powerful diffusion model for discriminative tasks.

Via

Access Paper or Ask Questions