Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wassim Hamidouche

Univ. Rennes, INSA Rennes, CNRS, IETR - UMR 6164, Rennes, France

360-Degree Video Super Resolution and Quality Enhancement Challenge: Methods and Results

Nov 11, 2024

Ahmed Telili, Wassim Hamidouche, Ibrahim Farhat, Hadi Amirpour, Christian Timmerer, Ibrahim Khadraoui, Jiajie Lu, The Van Le, Jeonneung Baek, Jin Young Lee(+5 more)

Abstract:Omnidirectional (360-degree) video is rapidly gaining popularity due to advancements in immersive technologies like virtual reality (VR) and extended reality (XR). However, real-time streaming of such videos, especially in live mobile scenarios like unmanned aerial vehicles (UAVs), is challenged by limited bandwidth and strict latency constraints. Traditional methods, such as compression and adaptive resolution, help but often compromise video quality and introduce artifacts that degrade the viewer experience. Additionally, the unique spherical geometry of 360-degree video presents challenges not encountered in traditional 2D video. To address these issues, we initiated the 360-degree Video Super Resolution and Quality Enhancement Challenge. This competition encourages participants to develop efficient machine learning solutions to enhance the quality of low-bitrate compressed 360-degree videos, with two tracks focusing on 2x and 4x super-resolution (SR). In this paper, we outline the challenge framework, detailing the two competition tracks and highlighting the SR solutions proposed by the top-performing models. We assess these models within a unified framework, considering quality enhancement, bitrate gain, and computational efficiency. This challenge aims to drive innovation in real-time 360-degree video streaming, improving the quality and accessibility of immersive visual experiences.

* 14 pages, 9 figures

Via

Access Paper or Ask Questions

Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Apr 07, 2024

Mamadou Keita, Wassim Hamidouche, Hessen Bougueffa Eutamene, Abdenour Hadid, Abdelmalik Taleb-Ahmed

Figure 1 for Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Figure 2 for Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Figure 3 for Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Figure 4 for Bi-LORA: A Vision-Language Approach for Synthetic Image Detection

Abstract:Advancements in deep image synthesis techniques, such as generative adversarial networks (GANs) and diffusion models (DMs), have ushered in an era of generating highly realistic images. While this technological progress has captured significant interest, it has also raised concerns about the potential difficulty in distinguishing real images from their synthetic counterparts. This paper takes inspiration from the potent convergence capabilities between vision and language, coupled with the zero-shot nature of vision-language models (VLMs). We introduce an innovative method called Bi-LORA that leverages VLMs, combined with low-rank adaptation (LORA) tuning techniques, to enhance the precision of synthetic image detection for unseen model-generated images. The pivotal conceptual shift in our methodology revolves around reframing binary classification as an image captioning task, leveraging the distinctive capabilities of cutting-edge VLM, notably bootstrapping language image pre-training (BLIP2). Rigorous and comprehensive experiments are conducted to validate the effectiveness of our proposed approach, particularly in detecting unseen diffusion-generated images from unknown diffusion-based generative models during training, showcasing robustness to noise, and demonstrating generalization capabilities to GANs. The obtained results showcase an impressive average accuracy of 93.41% in synthetic image detection on unseen generation models. The code and models associated with this research can be publicly accessed at https://github.com/Mamadou-Keita/VLM-DETECT.

Via

Access Paper or Ask Questions

Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Apr 03, 2024

Mamadou Keita, Wassim Hamidouche, Hassen Bougueffa, Abdenour Hadid, Abdelmalik Taleb-Ahmed

Figure 1 for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Figure 2 for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Figure 3 for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Figure 4 for Harnessing the Power of Large Vision Language Models for Synthetic Image Detection

Abstract:In recent years, the emergence of models capable of generating images from text has attracted considerable interest, offering the possibility of creating realistic images from text descriptions. Yet these advances have also raised concerns about the potential misuse of these images, including the creation of misleading content such as fake news and propaganda. This study investigates the effectiveness of using advanced vision-language models (VLMs) for synthetic image identification. Specifically, the focus is on tuning state-of-the-art image captioning models for synthetic image detection. By harnessing the robust understanding capabilities of large VLMs, the aim is to distinguish authentic images from synthetic images produced by diffusion-based models. This study contributes to the advancement of synthetic image detection by exploiting the capabilities of visual language models such as BLIP-2 and ViTGPT2. By tailoring image captioning models, we address the challenges associated with the potential misuse of synthetic images in real-world applications. Results described in this paper highlight the promising role of VLMs in the field of synthetic image detection, outperforming conventional image-based detection techniques. Code and models can be found at https://github.com/Mamadou-Keita/VLM-DETECT.

* arXiv admin note: substantial text overlap with arXiv:2404.01959

Via

Access Paper or Ask Questions

Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Apr 02, 2024

Nassim Sehad, Lina Bariah, Wassim Hamidouche, Hamed Hellaoui, Riku Jäntti, Mérouane Debbah

Figure 1 for Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Figure 2 for Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Figure 3 for Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Figure 4 for Generative AI for Immersive Communication: The Next Frontier in Internet-of-Senses Through 6G

Abstract:Over the past two decades, the Internet-of-Things (IoT) has been a transformative concept, and as we approach 2030, a new paradigm known as the Internet of Senses (IoS) is emerging. Unlike conventional Virtual Reality (VR), IoS seeks to provide multi-sensory experiences, acknowledging that in our physical reality, our perception extends far beyond just sight and sound; it encompasses a range of senses. This article explores existing technologies driving immersive multi-sensory media, delving into their capabilities and potential applications. This exploration includes a comparative analysis between conventional immersive media streaming and a proposed use case that leverages semantic communication empowered by generative Artificial Intelligence (AI). The focal point of this analysis is the substantial reduction in bandwidth consumption by 99.93% in the proposed scheme. Through this comparison, we aim to underscore the practical applications of generative AI for immersive media while addressing the challenges and outlining future trajectories.

Via

Access Paper or Ask Questions

ODVista: An Omnidirectional Video Dataset for super-resolution and Quality Enhancement Tasks

Mar 07, 2024

Ahmed Telili, Ibrahim Farhat, Wassim Hamidouche, Hadi Amirpour

Figure 1 for ODVista: An Omnidirectional Video Dataset for super-resolution and Quality Enhancement Tasks

Figure 2 for ODVista: An Omnidirectional Video Dataset for super-resolution and Quality Enhancement Tasks

Figure 3 for ODVista: An Omnidirectional Video Dataset for super-resolution and Quality Enhancement Tasks

Figure 4 for ODVista: An Omnidirectional Video Dataset for super-resolution and Quality Enhancement Tasks

Abstract:Omnidirectional or 360-degree video is being increasingly deployed, largely due to the latest advancements in immersive virtual reality (VR) and extended reality (XR) technology. However, the adoption of these videos in streaming encounters challenges related to bandwidth and latency, particularly in mobility conditions such as with unmanned aerial vehicles (UAVs). Adaptive resolution and compression aim to preserve quality while maintaining low latency under these constraints, yet downscaling and encoding can still degrade quality and introduce artifacts. Machine learning (ML)-based super-resolution (SR) and quality enhancement techniques offer a promising solution by enhancing detail recovery and reducing compression artifacts. However, current publicly available 360-degree video SR datasets lack compression artifacts, which limit research in this field. To bridge this gap, this paper introduces omnidirectional video streaming dataset (ODVista), which comprises 200 high-resolution and high quality videos downscaled and encoded at four bitrate ranges using the high-efficiency video coding (HEVC)/H.265 standard. Evaluations show that the dataset not only features a wide variety of scenes but also spans different levels of content complexity, which is crucial for robust solutions that perform well in real-world scenarios and generalize across diverse visual environments. Additionally, we evaluate the performance, considering both quality enhancement and runtime, of two handcrafted and two ML-based SR models on the validation and testing sets of ODVista.

Via

Access Paper or Ask Questions

NERV++: An Enhanced Implicit Neural Video Representation

Feb 28, 2024

Ahmed Ghorbel, Wassim Hamidouche, Luce Morin

Figure 1 for NERV++: An Enhanced Implicit Neural Video Representation

Figure 2 for NERV++: An Enhanced Implicit Neural Video Representation

Figure 3 for NERV++: An Enhanced Implicit Neural Video Representation

Figure 4 for NERV++: An Enhanced Implicit Neural Video Representation

Abstract:Neural fields, also known as implicit neural representations (INRs), have shown a remarkable capability of representing, generating, and manipulating various data types, allowing for continuous data reconstruction at a low memory footprint. Though promising, INRs applied to video compression still need to improve their rate-distortion performance by a large margin, and require a huge number of parameters and long training iterations to capture high-frequency details, limiting their wider applicability. Resolving this problem remains a quite challenging task, which would make INRs more accessible in compression tasks. We take a step towards resolving these shortcomings by introducing neural representations for videos NeRV++, an enhanced implicit neural video representation, as more straightforward yet effective enhancement over the original NeRV decoder architecture, featuring separable conv2d residual blocks (SCRBs) that sandwiches the upsampling block (UB), and a bilinear interpolation skip layer for improved feature representation. NeRV++ allows videos to be directly represented as a function approximated by a neural network, and significantly enhance the representation capacity beyond current INR-based video codecs. We evaluate our method on UVG, MCL JVC, and Bunny datasets, achieving competitive results for video compression with INRs. This achievement narrows the gap to autoencoder-based video coding, marking a significant stride in INR-based video compression research.

Via

Access Paper or Ask Questions

UAV Immersive Video Streaming: A Comprehensive Survey, Benchmarking, and Open Challenges

Oct 31, 2023

Mohit K. Sharma, Chen-Feng Liu, Ibrahim Farhat, Nassim Sehad, Wassim Hamidouche, Merouane Debbah

Abstract:Over the past decade, the utilization of UAVs has witnessed significant growth, owing to their agility, rapid deployment, and maneuverability. In particular, the use of UAV-mounted 360-degree cameras to capture omnidirectional videos has enabled truly immersive viewing experiences with up to 6DoF. However, achieving this immersive experience necessitates encoding omnidirectional videos in high resolution, leading to increased bitrates. Consequently, new challenges arise in terms of latency, throughput, perceived quality, and energy consumption for real-time streaming of such content. This paper presents a comprehensive survey of research efforts in UAV-based immersive video streaming, benchmarks popular video encoding schemes, and identifies open research challenges. Initially, we review the literature on 360-degree video coding, packaging, and streaming, with a particular focus on standardization efforts to ensure interoperability of immersive video streaming devices and services. Subsequently, we provide a comprehensive review of research efforts focused on optimizing video streaming for timevarying UAV wireless channels. Additionally, we introduce a high resolution 360-degree video dataset captured from UAVs under different flying conditions. This dataset facilitates the evaluation of complexity and coding efficiency of software and hardware video encoders based on popular video coding standards and formats, including AVC/H.264, HEVC/H.265, VVC/H.266, VP9, and AV1. Our results demonstrate that HEVC achieves the best trade-off between coding efficiency and complexity through its hardware implementation, while AV1 format excels in coding efficiency through its software implementation, specifically using the libsvt-av1 encoder. Furthermore, we present a real testbed showcasing 360-degree video streaming over a UAV, enabling remote control of the drone via a 5G cellular network.

Via

Access Paper or Ask Questions

Bitrate Ladder Prediction Methods for Adaptive Video Streaming: A Review and Benchmark

Oct 30, 2023

Ahmed Telili, Wassim Hamidouche, Hadi Amirpour, Sid Ahmed Fezza, Luce Morin, Christian Timmerer

Abstract:HTTP adaptive streaming (HAS) has emerged as a widely adopted approach for over-the-top (OTT) video streaming services, due to its ability to deliver a seamless streaming experience. A key component of HAS is the bitrate ladder, which provides the encoding parameters (e.g., bitrate-resolution pairs) to encode the source video. The representations in the bitrate ladder allow the client's player to dynamically adjust the quality of the video stream based on network conditions by selecting the most appropriate representation from the bitrate ladder. The most straightforward and lowest complexity approach involves using a fixed bitrate ladder for all videos, consisting of pre-determined bitrate-resolution pairs known as one-size-fits-all. Conversely, the most reliable technique relies on intensively encoding all resolutions over a wide range of bitrates to build the convex hull, thereby optimizing the bitrate ladder for each specific video. Several techniques have been proposed to predict content-based ladders without performing a costly exhaustive search encoding. This paper provides a comprehensive review of various methods, including both conventional and learning-based approaches. Furthermore, we conduct a benchmark study focusing exclusively on various learning-based approaches for predicting content-optimized bitrate ladders across multiple codec settings. The considered methods are evaluated on our proposed large-scale dataset, which includes 300 UHD video shots encoded with software and hardware encoders using three state-of-the-art encoders, including AVC/H.264, HEVC/H.265, and VVC/H.266, at various bitrate points. Our analysis provides baseline methods and insights, which will be valuable for future research in the field of bitrate ladder prediction. The source code of the proposed benchmark and the dataset will be made publicly available upon acceptance of the paper.

Via

Access Paper or Ask Questions

Video Quality Assessment and Coding Complexity of the Versatile Video Coding Standard

Oct 19, 2023

Thomas Amestoy, Naty Sidaty, Wassim Hamidouche, Pierrick Philippe, Daniel Menard

Figure 1 for Video Quality Assessment and Coding Complexity of the Versatile Video Coding Standard

Figure 2 for Video Quality Assessment and Coding Complexity of the Versatile Video Coding Standard

Figure 3 for Video Quality Assessment and Coding Complexity of the Versatile Video Coding Standard

Figure 4 for Video Quality Assessment and Coding Complexity of the Versatile Video Coding Standard

Abstract:In recent years, the proliferation of multimedia applications and formats, such as IPTV, Virtual Reality (VR, 360-degree), and point cloud videos, has presented new challenges to the video compression research community. Simultaneously, there has been a growing demand from users for higher resolutions and improved visual quality. To further enhance coding efficiency, a new video coding standard, Versatile Video Coding (VVC), was introduced in July 2020. This paper conducts a comprehensive analysis of coding performance and complexity for the latest VVC standard in comparison to its predecessor, High Efficiency Video Coding (HEVC). The study employs a diverse set of test sequences, covering both High Definition (HD) and Ultra High Definition (UHD) resolutions, and spans a wide range of bit-rates. These sequences are encoded using the reference software encoders of HEVC (HM) and VVC (VTM). The results consistently demonstrate that VVC outperforms HEVC, achieving bit-rate savings of up to 40% on the subjective quality scale, particularly at realistic bit-rates and quality levels. Objective quality metrics, including PSNR, SSIM, and VMAF, support these findings, revealing bit-rate savings ranging from 31% to 40%, depending on the video content, spatial resolution, and the selected quality metric. However, these improvements in coding efficiency come at the cost of significantly increased computational complexity. On average, our results indicate that the VVC decoding process is 1.5 times more complex, while the encoding process becomes at least eight times more complex than that of the HEVC reference encoder. Our simultaneous profiling of the two standards sheds light on the primary evolutionary differences between them and highlights the specific stages responsible for the observed increase in complexity.

Via

Access Paper or Ask Questions

Efficient Predictive Coding of Intra Prediction Modes

Oct 09, 2023

Kevin Reuzé, Wassim Hamidouche, Pierrick Philippe, Olivier Déforges

Abstract:The high efficiency video coding (HEVC) standard and the joint exploration model (JEM) codec incorporate 35 and 67 intra prediction modes (IPMs) respectively, which are essential for efficient compression of Intra coded blocks. These IPMs are transmitted to the decoder through a coding scheme. In our paper, we present an innovative approach to construct a dedicated coding scheme for IPM based on contextual information. This approach comprises three key steps: prediction, clustering, and coding, each of which has been enhanced by introducing new elements, namely, labels for prediction, tests for clustering, and codes for coding. In this context, we have proposed a method that utilizes a genetic algorithm to minimize the rate cost, aiming to derive the most efficient coding scheme while leveraging the available labels, tests, and codes. The resulting coding scheme, expressed as a binary tree, achieves the highest coding efficiency for a given level of complexity. In our experimental evaluation under the HEVC standard, we observed significant bitrate gains while maintaining coding efficiency under the JEM codec. These results demonstrate the potential of our approach to improve compression efficiency, particularly under the HEVC standard, while preserving the coding efficiency of the JEM codec.

Via

Access Paper or Ask Questions