Alert button
Picture for Sezer Karaoglu

Sezer Karaoglu

Alert button

ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

Dec 09, 2019
Anil S. Baslamisli, Partha Das, Hoang-An Le, Sezer Karaoglu, Theo Gevers

Figure 1 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition
Figure 2 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition
Figure 3 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition
Figure 4 for ShadingNet: Image Intrinsics by Fine-Grained Shading Decomposition

In general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than albedo changes, these methods may fail in distinguishing strong (cast) shadows from albedo variations. That in return may leak into albedo map predictions. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows). The aim is to distinguish strong cast shadows from reflectance variations. Two end-to-end supervised CNN models (ShadingNets) are proposed exploiting the fine-grained shading model. Furthermore, surface normal features are jointly learned by the proposed CNN networks. Surface normals are expected to assist the decomposition task. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with intrinsic image ground-truths. Large scale experiments show that our CNN approach using fine-grained shading decomposition outperforms state-of-the-art methods using unified shading.

* Submitted to IEEE Transactions on Image Processing (TIP) 
Viaarxiv icon

On the Benefit of Adversarial Training for Monocular Depth Estimation

Oct 29, 2019
Rick Groenendijk, Sezer Karaoglu, Theo Gevers, Thomas Mensink

Figure 1 for On the Benefit of Adversarial Training for Monocular Depth Estimation
Figure 2 for On the Benefit of Adversarial Training for Monocular Depth Estimation
Figure 3 for On the Benefit of Adversarial Training for Monocular Depth Estimation
Figure 4 for On the Benefit of Adversarial Training for Monocular Depth Estimation

In this paper we address the benefit of adding adversarial training to the task of monocular depth estimation. A model can be trained in a self-supervised setting on stereo pairs of images, where depth (disparities) are an intermediate result in a right-to-left image reconstruction pipeline. For the quality of the image reconstruction and disparity prediction, a combination of different losses is used, including L1 image reconstruction losses and left-right disparity smoothness. These are local pixel-wise losses, while depth prediction requires global consistency. Therefore, we extend the self-supervised network to become a Generative Adversarial Network (GAN), by including a discriminator which should tell apart reconstructed (fake) images from real images. We evaluate Vanilla GANs, LSGANs and Wasserstein GANs in combination with different pixel-wise reconstruction losses. Based on extensive experimental evaluation, we conclude that adversarial training is beneficial if and only if the reconstruction loss is not too constrained. Even though adversarial training seems promising because it promotes global consistency, non-adversarial training outperforms (or is on par with) any method trained with a GAN when a constrained reconstruction loss is used in combination with batch normalisation. Based on the insights of our experimental evaluation we obtain state-of-the art monocular depth estimation results by using batch normalisation and different output scales.

* 11 pages, 8 tables, 5 figures, accepted at CVIU 
Viaarxiv icon

Deception Detection by 2D-to-3D Face Reconstruction from Videos

Dec 26, 2018
Minh Ngô, Burak Mandira, Selim Fırat Yılmaz, Ward Heij, Sezer Karaoglu, Henri Bouma, Hamdi Dibeklioglu, Theo Gevers

Figure 1 for Deception Detection by 2D-to-3D Face Reconstruction from Videos
Figure 2 for Deception Detection by 2D-to-3D Face Reconstruction from Videos
Figure 3 for Deception Detection by 2D-to-3D Face Reconstruction from Videos
Figure 4 for Deception Detection by 2D-to-3D Face Reconstruction from Videos

Lies and deception are common phenomena in society, both in our private and professional lives. However, humans are notoriously bad at accurate deception detection. Based on the literature, human accuracy of distinguishing between lies and truthful statements is 54% on average, in other words it is slightly better than a random guess. While people do not much care about this issue, in high-stakes situations such as interrogations for series crimes and for evaluating the testimonies in court cases, accurate deception detection methods are highly desirable. To achieve a reliable, covert, and non-invasive deception detection, we propose a novel method that jointly extracts reliable low- and high-level facial features namely, 3D facial geometry, skin reflectance, expression, head pose, and scene illumination in a video sequence. Then these features are modeled using a Recurrent Neural Network to learn temporal characteristics of deceptive and honest behavior. We evaluate the proposed method on the Real-Life Trial (RLT) dataset that contains high-stake deceptive and honest videos recorded in courtrooms. Our results show that the proposed method (with an accuracy of 72.8%) improves the state of the art as well as outperforming the use of manually coded facial attributes 67.6%) in deception detection.

* 9 pages, 3 figures 
Viaarxiv icon

Inferring Point Clouds from Single Monocular Images by Depth Intermediation

Dec 20, 2018
Wei Zeng, Sezer Karaoglu, Theo Gevers

Figure 1 for Inferring Point Clouds from Single Monocular Images by Depth Intermediation
Figure 2 for Inferring Point Clouds from Single Monocular Images by Depth Intermediation
Figure 3 for Inferring Point Clouds from Single Monocular Images by Depth Intermediation
Figure 4 for Inferring Point Clouds from Single Monocular Images by Depth Intermediation

In this paper, we propose a framework for generating 3D point cloud of an object from a single-view RGB image. Most previous work predict the 3D point coordinates from single RGB images directly. We decompose this problem into depth estimation from single images and point completion from partial point clouds. Our method sequentially predicts the depth maps and then infers the complete 3D object point clouds based on the predicted partial point clouds. We explicitly impose the camera model geometrical constraint in our pipeline and enforce the alignment of the generated point clouds and estimated depth maps. Experimental results for the single image 3D object reconstruction task show that the proposed method outperforms state-of-the-art methods. Both the qualitative and quantitative results demonstrate the generality and suitability of our method.

Viaarxiv icon

Improving Face Detection Performance with 3D-Rendered Synthetic Data

Dec 18, 2018
Jian Han, Sezer Karaoglu, Hoang-An Le, Theo Gevers

Figure 1 for Improving Face Detection Performance with 3D-Rendered Synthetic Data
Figure 2 for Improving Face Detection Performance with 3D-Rendered Synthetic Data
Figure 3 for Improving Face Detection Performance with 3D-Rendered Synthetic Data
Figure 4 for Improving Face Detection Performance with 3D-Rendered Synthetic Data

In this paper, we provide a synthetic data generator methodology with fully controlled, multifaceted variations based on a new 3D face dataset (3DU-Face). We customized synthetic datasets to address specific types of variations (scale, pose, occlusion, blur, etc.), and systematically investigate the influence of different variations on face detection performances. We examine whether and how these factors contribute to better face detection performances. We validate our synthetic data augmentation for different face detectors (Faster RCNN, SSH and HR) on various face datasets (MAFA, UFDD and Wider Face).

Viaarxiv icon

Unsupervised Generation of Optical Flow Datasets from Videos in the Wild

Dec 10, 2018
Hoang-An Le, Tushar Nimbhorkar, Thomas Mensink, Anil S. Baslamisli, Sezer Karaoglu, Theo Gevers

Figure 1 for Unsupervised Generation of Optical Flow Datasets from Videos in the Wild
Figure 2 for Unsupervised Generation of Optical Flow Datasets from Videos in the Wild
Figure 3 for Unsupervised Generation of Optical Flow Datasets from Videos in the Wild
Figure 4 for Unsupervised Generation of Optical Flow Datasets from Videos in the Wild

Dense optical flow ground truths of non-rigid motion for real-world images are not available due to the non-intuitive annotation. Aiming at training optical flow deep networks, we present an unsupervised algorithm to generate optical flow ground truth from real-world videos. The algorithm extracts and matches objects of interest from pairs of images in videos to find initial constraints, and applies as-rigid-as-possible deformation over the objects of interest to obtain dense flow fields. The ground truth correctness is enforced by warping the objects in the first frames using the flow fields. We apply the algorithm on the DAVIS dataset to obtain optical flow ground truths for non-rigid movement of real-world objects, using either ground truth or predicted segmentation. We discuss several methods to increase the optical flow variations in the dataset. Extensive experimental results show that training on non-rigid real motion is beneficial compared to training on rigid synthetic data. Moreover, we show that our pipeline generates training data suitable to train successfully FlowNet-S, PWC-Net, and LiteFlowNet deep networks.

Viaarxiv icon

Color Constancy by GANs: An Experimental Survey

Dec 07, 2018
Partha Das, Anil S. Baslamisli, Yang Liu, Sezer Karaoglu, Theo Gevers

Figure 1 for Color Constancy by GANs: An Experimental Survey
Figure 2 for Color Constancy by GANs: An Experimental Survey
Figure 3 for Color Constancy by GANs: An Experimental Survey
Figure 4 for Color Constancy by GANs: An Experimental Survey

In this paper, we formulate the color constancy task as an image-to-image translation problem using GANs. By conducting a large set of experiments on different datasets, an experimental survey is provided on the use of different types of GANs to solve for color constancy i.e. CC-GANs (Color Constancy GANs). Based on the experimental review, recommendations are given for the design of CC-GAN architectures based on different criteria, circumstances and datasets.

Viaarxiv icon

Joint Learning of Intrinsic Images and Semantic Segmentation

Jul 31, 2018
Anil S. Baslamisli, Thomas T. Groenestege, Partha Das, Hoang-An Le, Sezer Karaoglu, Theo Gevers

Figure 1 for Joint Learning of Intrinsic Images and Semantic Segmentation
Figure 2 for Joint Learning of Intrinsic Images and Semantic Segmentation
Figure 3 for Joint Learning of Intrinsic Images and Semantic Segmentation
Figure 4 for Joint Learning of Intrinsic Images and Semantic Segmentation

Semantic segmentation of outdoor scenes is problematic when there are variations in imaging conditions. It is known that albedo (reflectance) is invariant to all kinds of illumination effects. Thus, using reflectance images for semantic segmentation task can be favorable. Additionally, not only segmentation may benefit from reflectance, but also segmentation may be useful for reflectance computation. Therefore, in this paper, the tasks of semantic segmentation and intrinsic image decomposition are considered as a combined process by exploring their mutual relationship in a joint fashion. To that end, we propose a supervised end-to-end CNN architecture to jointly learn intrinsic image decomposition and semantic segmentation. We analyze the gains of addressing those two problems jointly. Moreover, new cascade CNN architectures for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as single tasks. Furthermore, a dataset of 35K synthetic images of natural environments is created with corresponding albedo and shading (intrinsics), as well as semantic labels (segmentation) assigned to each object/scene. The experiments show that joint learning of intrinsic image decomposition and semantic segmentation is beneficial for both tasks for natural scenes. Dataset and models are available at: https://ivi.fnwi.uva.nl/cv/intrinseg

* ECCV 2018 
Viaarxiv icon

Object Class Detection and Classification using Multi Scale Gradient and Corner Point based Shape Descriptors

May 03, 2015
Basura Fernando, Sezer Karaoglu, Sajib Kumar Saha

Figure 1 for Object Class Detection and Classification using Multi Scale Gradient and Corner Point based Shape Descriptors

This paper presents a novel multi scale gradient and a corner point based shape descriptors. The novel multi scale gradient based shape descriptor is combined with generic Fourier descriptors to extract contour and region based shape information. Shape information based object class detection and classification technique with a random forest classifier has been optimized. Proposed integrated descriptor in this paper is robust to rotation, scale, translation, affine deformations, noisy contours and noisy shapes. The new corner point based interpolated shape descriptor has been exploited for fast object detection and classification with higher accuracy.

* 4 pages 
Viaarxiv icon