Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sezer Karaoglu

PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition

Mar 30, 2022

Partha Das, Sezer Karaoglu, Theo Gevers

Figure 1 for PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition

Figure 2 for PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition

Figure 3 for PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition

Figure 4 for PIE-Net: Photometric Invariant Edge Guided Network for Intrinsic Image Decomposition

Abstract:Intrinsic image decomposition is the process of recovering the image formation components (reflectance and shading) from an image. Previous methods employ either explicit priors to constrain the problem or implicit constraints as formulated by their losses (deep learning). These methods can be negatively influenced by strong illumination conditions causing shading-reflectance leakages. Therefore, in this paper, an end-to-end edge-driven hybrid CNN approach is proposed for intrinsic image decomposition. Edges correspond to illumination invariant gradients. To handle hard negative illumination transitions, a hierarchical approach is taken including global and local refinement layers. We make use of attention layers to further strengthen the learning process. An extensive ablation study and large scale experiments are conducted showing that it is beneficial for edge-driven hybrid IID networks to make use of illumination invariant descriptors and that separating global and local cues helps in improving the performance of the network. Finally, it is shown that the proposed method obtains state of the art performance and is able to generalise well to real world images. The project page with pretrained models, finetuned models and network code can be found at https://ivi.fnwi.uva.nl/cv/pienet/.

Via

Access Paper or Ask Questions

Generative Models for Multi-Illumination Color Constancy

Sep 02, 2021

Partha Das, Yang Liu, Sezer Karaoglu, Theo Gevers

Figure 1 for Generative Models for Multi-Illumination Color Constancy

Figure 2 for Generative Models for Multi-Illumination Color Constancy

Figure 3 for Generative Models for Multi-Illumination Color Constancy

Figure 4 for Generative Models for Multi-Illumination Color Constancy

Abstract:In this paper, the aim is multi-illumination color constancy. However, most of the existing color constancy methods are designed for single light sources. Furthermore, datasets for learning multiple illumination color constancy are largely missing. We propose a seed (physics driven) based multi-illumination color constancy method. GANs are exploited to model the illumination estimation problem as an image-to-image domain translation problem. Additionally, a novel multi-illumination data augmentation method is proposed. Experiments on single and multi-illumination datasets show that our methods outperform sota methods.

* Accepted in International Conference on Computer Vision Workshop (ICCVW) 2021

Via

Access Paper or Ask Questions

EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes

Nov 10, 2020

Hoang-An Le, Thomas Mensink, Partha Das, Sezer Karaoglu, Theo Gevers

Figure 1 for EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes

Figure 2 for EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes

Figure 3 for EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes

Figure 4 for EDEN: Multimodal Synthetic Dataset of Enclosed GarDEN Scenes

Abstract:Multimodal large-scale datasets for outdoor scenes are mostly designed for urban driving problems. The scenes are highly structured and semantically different from scenarios seen in nature-centered scenes such as gardens or parks. To promote machine learning methods for nature-oriented applications, such as agriculture and gardening, we propose the multimodal synthetic dataset for Enclosed garDEN scenes (EDEN). The dataset features more than 300K images captured from more than 100 garden models. Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow. Experimental results on the state-of-the-art methods for semantic segmentation and monocular depth prediction, two important tasks in computer vision, show positive impact of pre-training deep networks on our dataset for unstructured natural scenes. The dataset and related materials will be available at https://lhoangan.github.io/eden.

* Accepted for publishing at WACV 2021

Via

Access Paper or Ask Questions

Spatio-temporal Features for Generalized Detection of Deepfake Videos

Oct 22, 2020

Ipek Ganiyusufoglu, L. Minh Ngô, Nedko Savov, Sezer Karaoglu, Theo Gevers

Figure 1 for Spatio-temporal Features for Generalized Detection of Deepfake Videos

Figure 2 for Spatio-temporal Features for Generalized Detection of Deepfake Videos

Figure 3 for Spatio-temporal Features for Generalized Detection of Deepfake Videos

Figure 4 for Spatio-temporal Features for Generalized Detection of Deepfake Videos

Abstract:For deepfake detection, video-level detectors have not been explored as extensively as image-level detectors, which do not exploit temporal data. In this paper, we empirically show that existing approaches on image and sequence classifiers generalize poorly to new manipulation techniques. To this end, we propose spatio-temporal features, modeled by 3D CNNs, to extend the generalization capabilities to detect new sorts of deepfake videos. We show that spatial features learn distinct deepfake-method-specific attributes, while spatio-temporal features capture shared attributes between deepfake methods. We provide an in-depth analysis of how the sequential and spatio-temporal video encoders are utilizing temporal information using DFDC dataset arXiv:2006.07397. Thus, we unravel that our approach captures local spatio-temporal relations and inconsistencies in the deepfake videos while existing sequence encoders are indifferent to it. Through large scale experiments conducted on the FaceForensics++ arXiv:1901.08971 and Deeper Forensics arXiv:2001.03024 datasets, we show that our approach outperforms existing methods in terms of generalization capabilities.

* Submitted to Computer Vision and Image Understanding (CVIU)

Via

Access Paper or Ask Questions

Multi-Loss Weighting with Coefficient of Variations

Sep 03, 2020

Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink

Figure 1 for Multi-Loss Weighting with Coefficient of Variations

Figure 2 for Multi-Loss Weighting with Coefficient of Variations

Figure 3 for Multi-Loss Weighting with Coefficient of Variations

Figure 4 for Multi-Loss Weighting with Coefficient of Variations

Abstract:Many interesting tasks in machine learning and computer vision are learned by optimising an objective function defined as a weighted linear combination of multiple losses. The final performance is sensitive to choosing the correct (relative) weights for these losses. Finding a good set of weights is often done by adopting them into the set of hyper-parameters, which are set using an extensive grid search. This is computationally expensive. In this paper, the weights are defined based on properties observed while training the model, including the specific batch loss, the average loss, and the variance for each of the losses. An additional advantage is that the defined weights evolve during training, instead of using static loss weights. In literature, loss weighting is mostly used in a multi-task learning setting, where the different tasks obtain different weights. However, there is a plethora of single-task multi-loss problems that can benefit from automatic loss weighting. In this paper, it is shown that these multi-task approaches do not work on single tasks. Instead, a method is proposed that automatically and dynamically tunes loss weights throughout training specifically for single-task multi-loss problems. The method incorporates a measure of uncertainty to balance the losses. The validity of the approach is shown empirically for different tasks on multiple datasets.

Via

Access Paper or Ask Questions

Physics-based Shading Reconstruction for Intrinsic Image Decomposition

Sep 03, 2020

Anil S. Baslamisli, Yang Liu, Sezer Karaoglu, Theo Gevers

Figure 1 for Physics-based Shading Reconstruction for Intrinsic Image Decomposition

Figure 2 for Physics-based Shading Reconstruction for Intrinsic Image Decomposition

Figure 3 for Physics-based Shading Reconstruction for Intrinsic Image Decomposition

Figure 4 for Physics-based Shading Reconstruction for Intrinsic Image Decomposition

Abstract:We investigate the use of photometric invariance and deep learning to compute intrinsic images (albedo and shading). We propose albedo and shading gradient descriptors which are derived from physics-based models. Using the descriptors, albedo transitions are masked out and an initial sparse shading map is calculated directly from the corresponding RGB image gradients in a learning-free unsupervised manner. Then, an optimization method is proposed to reconstruct the full dense shading map. Finally, we integrate the generated shading map into a novel deep learning framework to refine it and also to predict corresponding albedo image to achieve intrinsic image decomposition. By doing so, we are the first to directly address the texture and intensity ambiguity problems of the shading estimations. Large scale experiments show that our approach steered by physics-based invariant descriptors achieve superior results on MIT Intrinsics, NIR-RGB Intrinsics, Multi-Illuminant Intrinsic Images, Spectral Intrinsic Images, As Realistic As Possible, and competitive results on Intrinsic Images in the Wild datasets while achieving state-of-the-art shading estimations.

* Submitted to Computer Vision and Image Understanding (CVIU)

Via

Access Paper or Ask Questions

Kinship Identification through Joint Learning Using Kinship Verification Ensemble

Apr 20, 2020

Wei Wang, Shaodi You, Sezer Karaoglu, Theo Gevers

Figure 1 for Kinship Identification through Joint Learning Using Kinship Verification Ensemble

Figure 2 for Kinship Identification through Joint Learning Using Kinship Verification Ensemble

Figure 3 for Kinship Identification through Joint Learning Using Kinship Verification Ensemble

Figure 4 for Kinship Identification through Joint Learning Using Kinship Verification Ensemble

Abstract:While kinship verification is a well-exploited task which only identifies whether or not two people are kins, kinship identification is the task to further identify the particular type of kinships and is not well exploited yet. We found that a naive extension of kinship verification cannot solve the identification properly. This is because the existing verification networks are individually trained on specific kinships and do not consider the context between different kinship types. Also, the existing kinship verification dataset has a biased positive-negative distribution, which is different from real-world distribution. To solve it, we propose a novel kinship identification approach through the joint training of kinship verification ensembles and a Joint Identification Module. We also propose to rebalance the training dataset to make it realistic. Rigorous experiments demonstrate an appealing performance on kinship identification task. It also demonstrates significant performance improvement of kinship verification when trained on the same unbiased data.

* 18 pages, 8 figures

Via

Access Paper or Ask Questions

ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

Dec 09, 2019

Anil S. Baslamisli, Partha Das, Hoang-An Le, Sezer Karaoglu, Theo Gevers

Figure 1 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

Figure 2 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

Figure 3 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

Figure 4 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

Abstract:In general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than albedo changes, these methods may fail in distinguishing strong (cast) shadows from albedo variations. That in return may leak into albedo map predictions. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows). The aim is to distinguish strong cast shadows from reflectance variations. Two end-to-end supervised CNN models (ShadingNets) are proposed exploiting the fine-grained shading model. Furthermore, surface normal features are jointly learned by the proposed CNN networks. Surface normals are expected to assist the decomposition task. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with intrinsic image ground-truths. Large scale experiments show that our CNN approach using fine-grained shading decomposition outperforms state-of-the-art methods using unified shading.

* Submitted to IEEE Transactions on Image Processing (TIP)

Via

Access Paper or Ask Questions

On the Benefit of Adversarial Training for Monocular Depth Estimation

Oct 29, 2019

Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink

Figure 1 for On the Benefit of Adversarial Training for Monocular Depth Estimation

Figure 2 for On the Benefit of Adversarial Training for Monocular Depth Estimation

Figure 3 for On the Benefit of Adversarial Training for Monocular Depth Estimation

Figure 4 for On the Benefit of Adversarial Training for Monocular Depth Estimation

Abstract:In this paper we address the benefit of adding adversarial training to the task of monocular depth estimation. A model can be trained in a self-supervised setting on stereo pairs of images, where depth (disparities) are an intermediate result in a right-to-left image reconstruction pipeline. For the quality of the image reconstruction and disparity prediction, a combination of different losses is used, including L1 image reconstruction losses and left-right disparity smoothness. These are local pixel-wise losses, while depth prediction requires global consistency. Therefore, we extend the self-supervised network to become a Generative Adversarial Network (GAN), by including a discriminator which should tell apart reconstructed (fake) images from real images. We evaluate Vanilla GANs, LSGANs and Wasserstein GANs in combination with different pixel-wise reconstruction losses. Based on extensive experimental evaluation, we conclude that adversarial training is beneficial if and only if the reconstruction loss is not too constrained. Even though adversarial training seems promising because it promotes global consistency, non-adversarial training outperforms (or is on par with) any method trained with a GAN when a constrained reconstruction loss is used in combination with batch normalisation. Based on the insights of our experimental evaluation we obtain state-of-the art monocular depth estimation results by using batch normalisation and different output scales.

* 11 pages, 8 tables, 5 figures, accepted at CVIU

Via

Access Paper or Ask Questions

Deception Detection by 2D-to-3D Face Reconstruction from Videos

Dec 26, 2018

Minh Ngô, Burak Mandira, Selim Fırat Yılmaz, Ward Heij, Sezer Karaoglu, Henri Bouma, Hamdi Dibeklioglu, Theo Gevers

Figure 1 for Deception Detection by 2D-to-3D Face Reconstruction from Videos

Figure 2 for Deception Detection by 2D-to-3D Face Reconstruction from Videos

Figure 3 for Deception Detection by 2D-to-3D Face Reconstruction from Videos

Figure 4 for Deception Detection by 2D-to-3D Face Reconstruction from Videos

Abstract:Lies and deception are common phenomena in society, both in our private and professional lives. However, humans are notoriously bad at accurate deception detection. Based on the literature, human accuracy of distinguishing between lies and truthful statements is 54% on average, in other words it is slightly better than a random guess. While people do not much care about this issue, in high-stakes situations such as interrogations for series crimes and for evaluating the testimonies in court cases, accurate deception detection methods are highly desirable. To achieve a reliable, covert, and non-invasive deception detection, we propose a novel method that jointly extracts reliable low- and high-level facial features namely, 3D facial geometry, skin reflectance, expression, head pose, and scene illumination in a video sequence. Then these features are modeled using a Recurrent Neural Network to learn temporal characteristics of deceptive and honest behavior. We evaluate the proposed method on the Real-Life Trial (RLT) dataset that contains high-stake deceptive and honest videos recorded in courtrooms. Our results show that the proposed method (with an accuracy of 72.8%) improves the state of the art as well as outperforming the use of manually coded facial attributes 67.6%) in deception detection.

* 9 pages, 3 figures

Via

Access Paper or Ask Questions