Face synthesis has been a fascinating yet challenging problem in computer vision and machine learning. Its main research effort is to design algorithms to generate photo-realistic face images via given semantic domain. It has been a crucial prepossessing step of main-stream face recognition approaches and an excellent test of AI ability to use complicated probability distributions. In this paper, we provide a comprehensive review of typical face synthesis works that involve traditional methods as well as advanced deep learning approaches. Particularly, Generative Adversarial Net (GAN) is highlighted to generate photo-realistic and identity preserving results. Furthermore, the public available databases and evaluation metrics are introduced in details. We end the review with discussing unsolved difficulties and promising directions for future research.
We address the task of multi-view image-to-image translation for person image generation. The goal is to synthesize photo-realistic multi-view images with pose-consistency across all views. Our proposed end-to-end framework is based on a joint learning of multiple unpaired image-to-image translation models, one per camera viewpoint. The joint learning is imposed by constraints on the shared 3D human pose in order to encourage the 2D pose projections in all views to be consistent. Experimental results on the CMU-Panoptic dataset demonstrate the effectiveness of the suggested framework in generating photo-realistic images of persons with new poses that are more consistent across all views in comparison to a standard Image-to-Image baseline. The code is available at: https://github.com/sony-si/MultiView-Img2Img
As more and more personal photos are shared and tagged in social media, avoiding privacy risks such as unintended recognition becomes increasingly challenging. We propose a new hybrid approach to obfuscate identities in photos by head replacement. Our approach combines state of the art parametric face synthesis with latest advances in Generative Adversarial Networks (GAN) for data-driven image synthesis. On the one hand, the parametric part of our method gives us control over the facial parameters and allows for explicit manipulation of the identity. On the other hand, the data-driven aspects allow for adding fine details and overall realism as well as seamless blending into the scene context. In our experiments, we show highly realistic output of our system that improves over the previous state of the art in obfuscation rate while preserving a higher similarity to the original image content.
To improve the temporal and spatial storage efficiency, researchers have intensively studied various techniques, including compression and deduplication. Through our evaluation, we find that methods such as photo tags or local features help to identify the content-based similar- ity between raw images. The images can then be com- pressed more efficiently to get better storage space sav- ings. Furthermore, storing similar raw images together enables rapid data sorting, searching and retrieval if the images are stored in a distributed and large-scale envi- ronment by reducing fragmentation. In this paper, we evaluated the compressibility by designing experiments and observing the results. We found that on a statistical basis the higher similarity photos have, the better com- pression results are. This research helps provide a clue for future large-scale storage system design.
In terms of Image-to-image translation, Generative Adversarial Networks (GANs) has achieved great success even when it is used in the unsupervised dataset. In this work, we aim to translate cartoon images to photo-realistic images using GAN. We apply several state-of-the-art models to perform this task; however, they fail to perform good quality translations. We observe that the shallow difference between these two domains causes this issue. Based on this idea, we propose a method based on CycleGAN model for image translation from cartoon domain to photo-realistic domain. To make our model efficient, we implemented Spectral Normalization which added stability in our model. We demonstrate our experimental results and show that our proposed model has achieved the lowest Frechet Inception Distance score and better results compared to another state-of-the-art technique, UNIT.
Object reconstruction is one of the main problems in cultural heritage preservation. This problem is due to lack of data in documentation. Thus in this research we presented a method of 3D reconstruction using close-range photogrammetry. We collected 1319 photos from five temples in Yogyakarta. Using A-KAZE algorithm, keypoints of each image were obtained. Then we employed LIOP to create feature descriptor from it. After performing feature matching, L1RA was utilized to create sparse point clouds. In order to generate the geometry shape, MVS was used. Finally, FSSR and Large Scale Texturing were employed to deal with the surface and texture of the object. The quality of the reconstructed 3D model was measured by comparing the 3D images of the model with the original photos utilizing SSIM. The results showed that in terms of quality, our method was on par with other commercial method such as PhotoModeler and PhotoScan.
Oysters are the living vacuum cleaners of the oceans. There is an exponential decline in the oyster population due to over-harvesting. With the current development of the automation and AI, robots are becoming an integral part of the environmental monitoring process that can be also utilized for oyster reef preservation. Nevertheless, the underwater environment poses many difficulties, both from the practical - dangerous and time consuming operations, and the technical perspectives - distorted perception and unreliable navigation. To this end, we present a simulated environment that can be used to improve oyster reef monitoring. The simulated environment can be used to create photo-realistic image datasets with multiple sensor data and ground truth location of a remotely operated vehicle(ROV). Currently, there are no photo-realistic image datasets for oyster reef monitoring. Thus, we want to provide a new benchmark suite to the underwater community.
Type "a sea otter with a pearl earring by Johannes Vermeer" or "a photo of a teddy bear on a skateboard in Times Square" into OpenAI's DALL-E-2 paint-by-text synthesis engine and you will not be disappointed by the delightful and eerily pertinent results. The ability to synthesize highly realistic images -- with seemingly no limitation other than our imagination -- is sure to yield many exciting and creative applications. These images are also likely to pose new challenges to the photo-forensic community. Motivated by the fact that paint by text is not based on explicit geometric modeling, and the human visual system's often obliviousness to even glaring geometric inconsistencies, we provide an initial exploration of the perspective consistency of DALL-E-2 synthesized images to determine if geometric-based forensic analyses will prove fruitful in detecting this new breed of synthetic media.
Generative Adversarial Networks (GANs) in supervised settings can generate photo-realistic corresponding output from low-definition input (SRGAN). Using the architecture presented in the SRGAN original paper , we explore how selecting a dataset affects the outcome by using three different datasets to see that SRGAN fundamentally learns objects, with their shape, color, and texture, and redraws them in the output rather than merely attempting to sharpen edges. This is further underscored with our demonstration that once the network learns the images of the dataset, it can generate a photo-like image with even a slight hint of what it might look like for the original from a very blurry edged sketch. Given a set of inference images, the network trained with the same dataset results in a better outcome over the one trained with arbitrary set of images, and we report its significance numerically with Frechet Inception Distance score .
Monitoring of disasters is crucial for mitigating their effects on the environment and human population, and can be facilitated by the use of unmanned aerial vehicles (UAV), equipped with camera sensors that produce aerial photos of the areas of interest. A modern technique for recognition of events based on aerial photos is deep learning. In this paper, we present the state of the art work related to the use of deep learning techniques for disaster identification. We demonstrate the potential of this technique in identifying disasters with high accuracy, by means of a relatively simple deep learning model. Based on a dataset of 544 images (containing disaster images such as fires, earthquakes, collapsed buildings, tsunami and flooding, as well as non-disaster scenes), our results show an accuracy of 91% achieved, indicating that deep learning, combined with UAV equipped with camera sensors, have the potential to predict disasters with high accuracy.