Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vibhaalakshmi Sivaraman

Reparo: Loss-Resilient Generative Codec for Video Conferencing

May 23, 2023

Tianhong Li, Vibhaalakshmi Sivaraman, Lijie Fan, Mohammad Alizadeh, Dina Katabi

Figure 1 for Reparo: Loss-Resilient Generative Codec for Video Conferencing

Figure 2 for Reparo: Loss-Resilient Generative Codec for Video Conferencing

Figure 3 for Reparo: Loss-Resilient Generative Codec for Video Conferencing

Figure 4 for Reparo: Loss-Resilient Generative Codec for Video Conferencing

Abstract:Loss of packets in video conferencing often results in poor quality and video freezing. Attempting to retransmit the lost packets is usually not practical due to the requirement for real-time playback. Using Forward Error Correction (FEC) to recover the lost packets is challenging since it is difficult to determine the appropriate level of redundancy. In this paper, we propose a framework called Reparo for creating loss-resilient video conferencing using generative deep learning models. Our approach involves generating missing information when a frame or part of a frame is lost. This generation is conditioned on the data received so far, and the model's knowledge of how people look, dress, and interact in the visual world. Our experiments on publicly available video conferencing datasets show that Reparo outperforms state-of-the-art FEC-based video conferencing in terms of both video quality (measured by PSNR) and video freezes.

Via

Access Paper or Ask Questions

Gemino: Practical and Robust Neural Compression for Video Conferencing

Sep 22, 2022

Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, Mehrdad Khani, Sadjad Fouladi, Mohammad Alizadeh, Frédo Durand, Vivienne Sze

Figure 1 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Figure 2 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Figure 3 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Figure 4 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Abstract:Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produce poor reconstructions in scenarios with major movement or occlusions over the course of a call, and do not scale to higher resolutions. We design Gemino, a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline. Gemino upsamples a very low-resolution version of each target frame while enhancing high-frequency details (e.g., skin texture, hair, etc.) based on information extracted from a single high-resolution reference image. We use a multi-scale architecture that runs different components of the model at different resolutions, allowing it to scale to resolutions comparable to 720p, and we personalize the model to learn specific details of each person, achieving much better fidelity at low bitrates. We implement Gemino atop aiortc, an open-source Python implementation of WebRTC, and show that it operates on 1024x1024 videos in real-time on a A100 GPU, and achieves 2.9x lower bitrate than traditional video codecs for the same perceptual quality.

* 12 pages, 6 appendix

Via

Access Paper or Ask Questions

Efficient Video Compression via Content-Adaptive Super-Resolution

Apr 06, 2021

Mehrdad Khani, Vibhaalakshmi Sivaraman, Mohammad Alizadeh

Figure 1 for Efficient Video Compression via Content-Adaptive Super-Resolution

Figure 2 for Efficient Video Compression via Content-Adaptive Super-Resolution

Figure 3 for Efficient Video Compression via Content-Adaptive Super-Resolution

Figure 4 for Efficient Video Compression via Content-Adaptive Super-Resolution

Abstract:Video compression is a critical component of Internet video delivery. Recent work has shown that deep learning techniques can rival or outperform human-designed algorithms, but these methods are significantly less compute and power-efficient than existing codecs. This paper presents a new approach that augments existing codecs with a small, content-adaptive super-resolution model that significantly boosts video quality. Our method, SRVC, encodes video into two bitstreams: (i) a content stream, produced by compressing downsampled low-resolution video with the existing codec, (ii) a model stream, which encodes periodic updates to a lightweight super-resolution neural network customized for short segments of the video. SRVC decodes the video by passing the decompressed low-resolution video frames through the (time-varying) super-resolution model to reconstruct high-resolution video frames. Our results show that to achieve the same PSNR, SRVC requires 16% of the bits-per-pixel of H.265 in slow mode, and 2% of the bits-per-pixel of DVC, a recent deep learning-based video compression scheme. SRVC runs at 90 frames per second on a NVIDIA V100 GPU.

Via

Access Paper or Ask Questions