Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"photo": models, code, and papers

Car-Driver Drowsiness Assessment through 1D Temporal Convolutional Networks

Jul 27, 2023
Francesco Rundo, Concetto Spampinato, Michael Rundo

Figure 1 for Car-Driver Drowsiness Assessment through 1D Temporal Convolutional Networks

Figure 2 for Car-Driver Drowsiness Assessment through 1D Temporal Convolutional Networks

Figure 3 for Car-Driver Drowsiness Assessment through 1D Temporal Convolutional Networks

Figure 4 for Car-Driver Drowsiness Assessment through 1D Temporal Convolutional Networks

Recently, the scientific progress of Advanced Driver Assistance System solutions (ADAS) has played a key role in enhancing the overall safety of driving. ADAS technology enables active control of vehicles to prevent potentially risky situations. An important aspect that researchers have focused on is the analysis of the driver attention level, as recent reports confirmed a rising number of accidents caused by drowsiness or lack of attentiveness. To address this issue, various studies have suggested monitoring the driver physiological state, as there exists a well-established connection between the Autonomic Nervous System (ANS) and the level of attention. For our study, we designed an innovative bio-sensor comprising near-infrared LED emitters and photo-detectors, specifically a Silicon PhotoMultiplier device. This allowed us to assess the driver physiological status by analyzing the associated PhotoPlethysmography (PPG) signal.Furthermore, we developed an embedded time-domain hyper-filtering technique in conjunction with a 1D Temporal Convolutional architecture that embdes a progressive dilation setup. This integrated system enables near real-time classification of driver drowsiness, yielding remarkable accuracy levels of approximately 96%.

Via

Access Paper or Ask Questions

Realistic Saliency Guided Image Enhancement

Jun 09, 2023
S. Mahdi H. Miangoleh, Zoya Bylinskii, Eric Kee, Eli Shechtman, Yağız Aksoy

Figure 1 for Realistic Saliency Guided Image Enhancement

Figure 2 for Realistic Saliency Guided Image Enhancement

Figure 3 for Realistic Saliency Guided Image Enhancement

Figure 4 for Realistic Saliency Guided Image Enhancement

Common editing operations performed by professional photographers include the cleanup operations: de-emphasizing distracting elements and enhancing subjects. These edits are challenging, requiring a delicate balance between manipulating the viewer's attention while maintaining photo realism. While recent approaches can boast successful examples of attention attenuation or amplification, most of them also suffer from frequent unrealistic edits. We propose a realism loss for saliency-guided image enhancement to maintain high realism across varying image types, while attenuating distractors and amplifying objects of interest. Evaluations with professional photographers confirm that we achieve the dual objective of realism and effectiveness, and outperform the recent approaches on their own datasets, while requiring a smaller memory footprint and runtime. We thus offer a viable solution for automating image enhancement and photo cleanup operations.

* Proc. CVPR (2023)
* For more info visit http://yaksoy.github.io/realisticEditing/

Via

Access Paper or Ask Questions

Web Photo Source Identification based on Neural Enhanced Camera Fingerprint

Feb 18, 2023
Feng Qian, Sifeng He, Honghao Huang, Huanyu Ma, Xiaobo Zhang, Lei Yang

Figure 1 for Web Photo Source Identification based on Neural Enhanced Camera Fingerprint

Figure 2 for Web Photo Source Identification based on Neural Enhanced Camera Fingerprint

Figure 3 for Web Photo Source Identification based on Neural Enhanced Camera Fingerprint

Figure 4 for Web Photo Source Identification based on Neural Enhanced Camera Fingerprint

With the growing popularity of smartphone photography in recent years, web photos play an increasingly important role in all walks of life. Source camera identification of web photos aims to establish a reliable linkage from the captured images to their source cameras, and has a broad range of applications, such as image copyright protection, user authentication, investigated evidence verification, etc. This paper presents an innovative and practical source identification framework that employs neural-network enhanced sensor pattern noise to trace back web photos efficiently while ensuring security. Our proposed framework consists of three main stages: initial device fingerprint registration, fingerprint extraction and cryptographic connection establishment while taking photos, and connection verification between photos and source devices. By incorporating metric learning and frequency consistency into the deep network design, our proposed fingerprint extraction algorithm achieves state-of-the-art performance on modern smartphone photos for reliable source identification. Meanwhile, we also propose several optimization sub-modules to prevent fingerprint leakage and improve accuracy and efficiency. Finally for practical system design, two cryptographic schemes are introduced to reliably identify the correlation between registered fingerprint and verified photo fingerprint, i.e. fuzzy extractor and zero-knowledge proof (ZKP). The codes for fingerprint extraction network and benchmark dataset with modern smartphone cameras photos are all publicly available at https://github.com/PhotoNecf/PhotoNecf.

* Accepted by WWW2023 (https://www2023.thewebconf.org/). Codes are all publicly available at https://github.com/PhotoNecf/PhotoNecf

Via

Access Paper or Ask Questions

Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Jul 14, 2022
Boming Zhao, Bangbang Yang, Zhenyang Li, Zuoyue Li, Guofeng Zhang, Jiashu Zhao, Dawei Yin, Zhaopeng Cui, Hujun Bao

Figure 1 for Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Figure 2 for Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Figure 3 for Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Figure 4 for Factorized and Controllable Neural Re-Rendering of Outdoor Scene for Photo Extrapolation

Expanding an existing tourist photo from a partially captured scene to a full scene is one of the desired experiences for photography applications. Although photo extrapolation has been well studied, it is much more challenging to extrapolate a photo (i.e., selfie) from a narrow field of view to a wider one while maintaining a similar visual style. In this paper, we propose a factorized neural re-rendering model to produce photorealistic novel views from cluttered outdoor Internet photo collections, which enables the applications including controllable scene re-rendering, photo extrapolation and even extrapolated 3D photo generation. Specifically, we first develop a novel factorized re-rendering pipeline to handle the ambiguity in the decomposition of geometry, appearance and illumination. We also propose a composited training strategy to tackle the unexpected occlusion in Internet images. Moreover, to enhance photo-realism when extrapolating tourist photographs, we propose a novel realism augmentation process to complement appearance details, which automatically propagates the texture details from a narrow captured photo to the extrapolated neural rendered image. The experiments and photo editing examples on outdoor scenes demonstrate the superior performance of our proposed method in both photo-realism and downstream applications.

* Accepted to ACM Multimedia 2022. Project Page: https://zju3dv.github.io/neural_outdoor_rerender/

Via

Access Paper or Ask Questions

Scaling Data Generation in Vision-and-Language Navigation

Jul 28, 2023
Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

Figure 1 for Scaling Data Generation in Vision-and-Language Navigation

Figure 2 for Scaling Data Generation in Vision-and-Language Navigation

Figure 3 for Scaling Data Generation in Vision-and-Language Navigation

Figure 4 for Scaling Data Generation in Vision-and-Language Navigation

Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning. The long-lasting generalization gap between navigating in seen and unseen environments is also reduced to less than 1% (versus 8% in the previous best method). Moreover, our paradigm also facilitates different models to achieve new state-of-the-art navigation results on CVDN, REVERIE, and R2R in continuous environments.

* ICCV 2023

Via

Access Paper or Ask Questions

Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation

Jul 11, 2023
Marcelo Jacinto, João Pinto, Jay Patrikar, John Keller, Rita Cunha, Sebastian Scherer, António Pascoal

Figure 1 for Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation

Figure 2 for Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation

Figure 3 for Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation

Figure 4 for Pegasus Simulator: An Isaac Sim Framework for Multiple Aerial Vehicles Simulation

Developing and testing novel control and motion planning algorithms for aerial vehicles can be a challenging task, with the robotics community relying more than ever on 3D simulation technologies to evaluate the performance of new algorithms in a variety of conditions and environments. In this work, we introduce the Pegasus Simulator, a modular framework implemented as an NVIDIA Isaac Sim extension that enables real-time simulation of multiple multirotor vehicles in photo-realistic environments, while providing out-of-the-box integration with the widely adopted PX4-Autopilot and ROS2 through its modular implementation and intuitive graphical user interface. To demonstrate some of its capabilities, a nonlinear controller was implemented and simulation results for two drones performing aggressive flight maneuvers are presented. Code and documentation for this framework are also provided as supplementary material.

Via

Access Paper or Ask Questions

3DHumanGAN: Towards Photo-Realistic 3D-Aware Human Image Generation

Dec 14, 2022
Zhuoqian Yang, Shikai Li, Wayne Wu, Bo Dai

Figure 1 for 3DHumanGAN: Towards Photo-Realistic 3D-Aware Human Image Generation

Figure 2 for 3DHumanGAN: Towards Photo-Realistic 3D-Aware Human Image Generation

Figure 3 for 3DHumanGAN: Towards Photo-Realistic 3D-Aware Human Image Generation

Figure 4 for 3DHumanGAN: Towards Photo-Realistic 3D-Aware Human Image Generation

We present 3DHumanGAN, a 3D-aware generative adversarial network (GAN) that synthesizes images of full-body humans with consistent appearances under different view-angles and body-poses. To tackle the representational and computational challenges in synthesizing the articulated structure of human bodies, we propose a novel generator architecture in which a 2D convolutional backbone is modulated by a 3D pose mapping network. The 3D pose mapping network is formulated as a renderable implicit function conditioned on a posed 3D human mesh. This design has several merits: i) it allows us to harness the power of 2D GANs to generate photo-realistic images; ii) it generates consistent images under varying view-angles and specifiable poses; iii) the model can benefit from the 3D human prior. Our model is adversarially learned from a collection of web images needless of manual annotation.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions

EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing

Jun 30, 2023
Bryan S. Kim, Jeong Young Jeong, Wonjong Ryu

Figure 1 for EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing

Figure 2 for EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing

Figure 3 for EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing

Figure 4 for EyeBAG: Accurate Control of Eye Blink and Gaze Based on Data Augmentation Leveraging Style Mixing

Recent developments in generative models have enabled the generation of photo-realistic human face images, and downstream tasks utilizing face generation technology have advanced accordingly. However, models for downstream tasks are yet substandard at eye control (e.g. eye blink, gaze redirection). To overcome such eye control problems, we introduce a novel framework consisting of two distinct modules: a blink control module and a gaze redirection module. We also propose a novel data augmentation method to train each module, leveraging style mixing to obtain images with desired features. We show that our framework produces eye-controlled images of high quality, and demonstrate how it can be used to improve the performance of downstream tasks.

Via

Access Paper or Ask Questions

Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

Sep 12, 2022
Kailas Vodrahalli, Justin Ko, Albert S. Chiou, Roberto Novoa, Abubakar Abid, Michelle Phung, Kiana Yekrang, Paige Petrone, James Zou, Roxana Daneshjou

Figure 1 for Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

Figure 2 for Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

Figure 3 for Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

Figure 4 for Development and Clinical Evaluation of an AI Support Tool for Improving Telemedicine Photo Quality

Telemedicine utilization was accelerated during the COVID-19 pandemic, and skin conditions were a common use case. However, the quality of photographs sent by patients remains a major limitation. To address this issue, we developed TrueImage 2.0, an artificial intelligence (AI) model for assessing patient photo quality for telemedicine and providing real-time feedback to patients for photo quality improvement. TrueImage 2.0 was trained on 1700 telemedicine images annotated by clinicians for photo quality. On a retrospective dataset of 357 telemedicine images, TrueImage 2.0 effectively identified poor quality images (Receiver operator curve area under the curve (ROC-AUC) =0.78) and the reason for poor quality (Blurry ROC-AUC=0.84, Lighting issues ROC-AUC=0.70). The performance is consistent across age, gender, and skin tone. Next, we assessed whether patient-TrueImage 2.0 interaction led to an improvement in submitted photo quality through a prospective clinical pilot study with 98 patients. TrueImage 2.0 reduced the number of patients with a poor-quality image by 68.0%.

* 24 pages, 7 figures

Via

Access Paper or Ask Questions

CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Jul 18, 2023
Ahmet Canberk Baykal, Abdul Basit Anees, Duygu Ceylan, Erkut Erdem, Aykut Erdem, Deniz Yuret

Figure 1 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Figure 2 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Figure 3 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Figure 4 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Researchers have recently begun exploring the use of StyleGAN-based models for real image editing. One particularly interesting application is using natural language descriptions to guide the editing process. Existing approaches for editing images using language either resort to instance-level latent code optimization or map predefined text prompts to some editing directions in the latent space. However, these approaches have inherent limitations. The former is not very efficient, while the latter often struggles to effectively handle multi-attribute changes. To address these weaknesses, we present CLIPInverter, a new text-driven image editing approach that is able to efficiently and reliably perform multi-attribute changes. The core of our method is the use of novel, lightweight text-conditioned adapter layers integrated into pretrained GAN-inversion networks. We demonstrate that by conditioning the initial inversion step on the CLIP embedding of the target description, we are able to obtain more successful edit directions. Additionally, we use a CLIP-guided refinement step to make corrections in the resulting residual latent codes, which further improves the alignment with the text prompt. Our method outperforms competing approaches in terms of manipulation accuracy and photo-realism on various domains including human faces, cats, and birds, as shown by our qualitative and quantitative results.

* Accepted for publication in ACM Transactions on Graphics

Via

Access Paper or Ask Questions