Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Alizadeh

Gemino: Practical and Robust Neural Compression for Video Conferencing

Sep 22, 2022

Vibhaalakshmi Sivaraman, Pantea Karimi, Vedantha Venkatapathy, Mehrdad Khani, Sadjad Fouladi, Mohammad Alizadeh, Frédo Durand, Vivienne Sze

Figure 1 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Figure 2 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Figure 3 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Figure 4 for Gemino: Practical and Robust Neural Compression for Video Conferencing

Abstract:Video conferencing systems suffer from poor user experience when network conditions deteriorate because current video codecs simply cannot operate at extremely low bitrates. Recently, several neural alternatives have been proposed that reconstruct talking head videos at very low bitrates using sparse representations of each frame such as facial landmark information. However, these approaches produce poor reconstructions in scenarios with major movement or occlusions over the course of a call, and do not scale to higher resolutions. We design Gemino, a new neural compression system for video conferencing based on a novel high-frequency-conditional super-resolution pipeline. Gemino upsamples a very low-resolution version of each target frame while enhancing high-frequency details (e.g., skin texture, hair, etc.) based on information extracted from a single high-resolution reference image. We use a multi-scale architecture that runs different components of the model at different resolutions, allowing it to scale to resolutions comparable to 720p, and we personalize the model to learn specific details of each person, achieving much better fidelity at low bitrates. We implement Gemino atop aiortc, an open-source Python implementation of WebRTC, and show that it operates on 1024x1024 videos in real-time on a A100 GPU, and achieves 2.9x lower bitrate than traditional video codecs for the same perceptual quality.

* 12 pages, 6 appendix

Via

Access Paper or Ask Questions

Reinforcement Learning in Time-Varying Systems: an Empirical Study

Jan 14, 2022

Pouya Hamadanian, Malte Schwarzkopf, Siddartha Sen, Mohammad Alizadeh

Figure 1 for Reinforcement Learning in Time-Varying Systems: an Empirical Study

Figure 2 for Reinforcement Learning in Time-Varying Systems: an Empirical Study

Figure 3 for Reinforcement Learning in Time-Varying Systems: an Empirical Study

Figure 4 for Reinforcement Learning in Time-Varying Systems: an Empirical Study

Abstract:Recent research has turned to Reinforcement Learning (RL) to solve challenging decision problems, as an alternative to hand-tuned heuristics. RL can learn good policies without the need for modeling the environment's dynamics. Despite this promise, RL remains an impractical solution for many real-world systems problems. A particularly challenging case occurs when the environment changes over time, i.e. it exhibits non-stationarity. In this work, we characterize the challenges introduced by non-stationarity and develop a framework for addressing them to train RL agents in live systems. Such agents must explore and learn new environments, without hurting the system's performance, and remember them over time. To this end, our framework (1) identifies different environments encountered by the live system, (2) explores and trains a separate expert policy for each environment, and (3) employs safeguards to protect the system's performance. We apply our framework to two systems problems: straggler mitigation and adaptive video streaming, and evaluate it against a variety of alternative approaches using real-world and synthetic data. We show that each component of our framework is necessary to cope with non-stationarity.

Via

Access Paper or Ask Questions

CausalSim: Toward a Causal Data-Driven Simulator for Network Protocols

Jan 05, 2022

Abdullah Alomar, Pouya Hamadanian, Arash Nasr-Esfahany, Anish Agarwal, Mohammad Alizadeh, Devavrat Shah

Figure 1 for CausalSim: Toward a Causal Data-Driven Simulator for Network Protocols

Figure 2 for CausalSim: Toward a Causal Data-Driven Simulator for Network Protocols

Figure 3 for CausalSim: Toward a Causal Data-Driven Simulator for Network Protocols

Figure 4 for CausalSim: Toward a Causal Data-Driven Simulator for Network Protocols

Abstract:Evaluating the real-world performance of network protocols is challenging. Randomized control trials (RCT) are expensive and inaccessible to most researchers, while expert-designed simulators fail to capture complex behaviors in real networks. We present CausalSim, a data-driven simulator for network protocols that addresses this challenge. Learning network behavior from observational data is complicated due to the bias introduced by the protocols used during data collection. CausalSim uses traces from an initial RCT under a set of protocols to learn a causal network model, effectively removing the biases present in the data. Using this model, CausalSim can then simulate any protocol over the same traces (i.e., for counterfactual predictions). Key to CausalSim is the novel use of adversarial neural network training that exploits distributional invariances that are present due to the training data coming from an RCT. Our extensive evaluation of CausalSim on both real and synthetic datasets and two use cases, including more than nine months of real data from the Puffer video streaming system, shows that it provides accurate counterfactual predictions, reducing prediction error by 44% and 53% on average compared to expert-designed and standard supervised learning baselines.

Via

Access Paper or Ask Questions

Efficient Strong Scaling Through Burst Parallel Training

Dec 19, 2021

Seo Jin Park, Joshua Fried, Sunghyun Kim, Mohammad Alizadeh, Adam Belay

Figure 1 for Efficient Strong Scaling Through Burst Parallel Training

Figure 2 for Efficient Strong Scaling Through Burst Parallel Training

Figure 3 for Efficient Strong Scaling Through Burst Parallel Training

Figure 4 for Efficient Strong Scaling Through Burst Parallel Training

Abstract:As emerging deep neural network (DNN) models continue to grow in size, using large GPU clusters to train DNNs is becoming an essential requirement to achieving acceptable training times. In this paper, we consider the case where future increases in cluster size will cause the global batch size that can be used to train models to reach a fundamental limit: beyond a certain point, larger global batch sizes cause sample efficiency to degrade, increasing overall time to accuracy. As a result, to achieve further improvements in training performance, we must instead consider "strong scaling" strategies that hold the global batch size constant and allocate smaller batches to each GPU. Unfortunately, this makes it significantly more difficult to use cluster resources efficiently. We present DeepPool, a system that addresses this efficiency challenge through two key ideas. First, burst parallelism allocates large numbers of GPUs to foreground jobs in bursts to exploit the unevenness in parallelism across layers. Second, GPU multiplexing prioritizes throughput for foreground training jobs, while packing in background training jobs to reclaim underutilized GPU resources, thereby improving cluster-wide utilization. Together, these two ideas enable DeepPool to deliver a 2.2 - 2.4x improvement in total cluster throughput over standard data parallelism with a single task when the cluster scale is large.

Via

Access Paper or Ask Questions

Updating Street Maps using Changes Detected in Satellite Imagery

Oct 13, 2021

Favyen Bastani, Songtao He, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Sam Madden, Mohammad Amin Sadeghi

Figure 1 for Updating Street Maps using Changes Detected in Satellite Imagery

Figure 2 for Updating Street Maps using Changes Detected in Satellite Imagery

Figure 3 for Updating Street Maps using Changes Detected in Satellite Imagery

Figure 4 for Updating Street Maps using Changes Detected in Satellite Imagery

Abstract:Accurately maintaining digital street maps is labor-intensive. To address this challenge, much work has studied automatically processing geospatial data sources such as GPS trajectories and satellite images to reduce the cost of maintaining digital maps. An end-to-end map update system would first process geospatial data sources to extract insights, and second leverage those insights to update and improve the map. However, prior work largely focuses on the first step of this pipeline: these map extraction methods infer road networks from scratch given geospatial data sources (in effect creating entirely new maps), but do not address the second step of leveraging this extracted information to update the existing digital map data. In this paper, we first explain why current map extraction techniques yield low accuracy when extended to update existing maps. We then propose a novel method that leverages the progression of satellite imagery over time to substantially improve accuracy. Our approach first compares satellite images captured at different times to identify portions of the physical road network that have visibly changed, and then updates the existing map accordingly. We show that our change-based approach reduces map update error rates four-fold.

* SIGSPATIAL 2021

Via

Access Paper or Ask Questions

Throughput-Fairness Tradeoffs in Mobility Platforms

May 25, 2021

Arjun Balasingam, Karthik Gopalakrishnan, Radhika Mittal, Venkat Arun, Ahmed Saeed, Mohammad Alizadeh, Hamsa Balakrishnan, Hari Balakrishnan

Figure 1 for Throughput-Fairness Tradeoffs in Mobility Platforms

Figure 2 for Throughput-Fairness Tradeoffs in Mobility Platforms

Figure 3 for Throughput-Fairness Tradeoffs in Mobility Platforms

Figure 4 for Throughput-Fairness Tradeoffs in Mobility Platforms

Abstract:This paper studies the problem of allocating tasks from different customers to vehicles in mobility platforms, which are used for applications like food and package delivery, ridesharing, and mobile sensing. A mobility platform should allocate tasks to vehicles and schedule them in order to optimize both throughput and fairness across customers. However, existing approaches to scheduling tasks in mobility platforms ignore fairness. We introduce Mobius, a system that uses guided optimization to achieve both high throughput and fairness across customers. Mobius supports spatiotemporally diverse and dynamic customer demands. It provides a principled method to navigate inherent tradeoffs between fairness and throughput caused by shared mobility. Our evaluation demonstrates these properties, along with the versatility and scalability of Mobius, using traces gathered from ridesharing and aerial sensing applications. Our ridesharing case study shows that Mobius can schedule more than 16,000 tasks across 40 customers and 200 vehicles in an online manner.

* Technical report for paper to appear at ACM MobiSys 2021

Via

Access Paper or Ask Questions

Efficient Video Compression via Content-Adaptive Super-Resolution

Apr 06, 2021

Mehrdad Khani, Vibhaalakshmi Sivaraman, Mohammad Alizadeh

Figure 1 for Efficient Video Compression via Content-Adaptive Super-Resolution

Figure 2 for Efficient Video Compression via Content-Adaptive Super-Resolution

Figure 3 for Efficient Video Compression via Content-Adaptive Super-Resolution

Figure 4 for Efficient Video Compression via Content-Adaptive Super-Resolution

Abstract:Video compression is a critical component of Internet video delivery. Recent work has shown that deep learning techniques can rival or outperform human-designed algorithms, but these methods are significantly less compute and power-efficient than existing codecs. This paper presents a new approach that augments existing codecs with a small, content-adaptive super-resolution model that significantly boosts video quality. Our method, SRVC, encodes video into two bitstreams: (i) a content stream, produced by compressing downsampled low-resolution video with the existing codec, (ii) a model stream, which encodes periodic updates to a lightweight super-resolution neural network customized for short segments of the video. SRVC decodes the video by passing the decompressed low-resolution video frames through the (time-varying) super-resolution model to reconstruct high-resolution video frames. Our results show that to achieve the same PSNR, SRVC requires 16% of the bits-per-pixel of H.265 in slow mode, and 2% of the bits-per-pixel of DVC, a recent deep learning-based video compression scheme. SRVC runs at 90 frames per second on a NVIDIA V100 GPU.

Via

Access Paper or Ask Questions

TagMe: GPS-Assisted Automatic Object Annotation in Videos

Mar 24, 2021

Songtao He, Favyen Bastani, Mohammad Alizadeh, Hari Balakrishnan, Michael Cafarella, Tim Kraska, Sam Madden

Figure 1 for TagMe: GPS-Assisted Automatic Object Annotation in Videos

Figure 2 for TagMe: GPS-Assisted Automatic Object Annotation in Videos

Figure 3 for TagMe: GPS-Assisted Automatic Object Annotation in Videos

Figure 4 for TagMe: GPS-Assisted Automatic Object Annotation in Videos

Abstract:Training high-accuracy object detection models requires large and diverse annotated datasets. However, creating these data-sets is time-consuming and expensive since it relies on human annotators. We design, implement, and evaluate TagMe, a new approach for automatic object annotation in videos that uses GPS data. When the GPS trace of an object is available, TagMe matches the object's motion from GPS trace and the pixels' motions in the video to find the pixels belonging to the object in the video and creates the bounding box annotations of the object. TagMe works using passive data collection and can continuously generate new object annotations from outdoor video streams without any human annotators. We evaluate TagMe on a dataset of 100 video clips. We show TagMe can produce high-quality object annotations in a fully-automatic and low-cost way. Compared with the traditional human-in-the-loop solution, TagMe can produce the same amount of annotations at a much lower cost, e.g., up to 110x.

* https://people.csail.mit.edu/songtao/tagme.html

Via

Access Paper or Ask Questions

Real-world Video Adaptation with Reinforcement Learning

Aug 28, 2020

Hongzi Mao, Shannon Chen, Drew Dimmery, Shaun Singh, Drew Blaisdell, Yuandong Tian, Mohammad Alizadeh, Eytan Bakshy

Figure 1 for Real-world Video Adaptation with Reinforcement Learning

Figure 2 for Real-world Video Adaptation with Reinforcement Learning

Figure 3 for Real-world Video Adaptation with Reinforcement Learning

Figure 4 for Real-world Video Adaptation with Reinforcement Learning

Abstract:Client-side video players employ adaptive bitrate (ABR) algorithms to optimize user quality of experience (QoE). We evaluate recently proposed RL-based ABR methods in Facebook's web-based video streaming platform. Real-world ABR contains several challenges that requires customized designs beyond off-the-shelf RL algorithms -- we implement a scalable neural network architecture that supports videos with arbitrary bitrate encodings; we design a training method to cope with the variance resulting from the stochasticity in network conditions; and we leverage constrained Bayesian optimization for reward shaping in order to optimize the conflicting QoE objectives. In a week-long worldwide deployment with more than 30 million video streaming sessions, our RL approach outperforms the existing human-engineered ABR algorithms.

* Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36th International Conference on Machine Learning, Long Beach, California, USA, 2019

Via

Access Paper or Ask Questions

Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding

Jul 19, 2020

Songtao He, Favyen Bastani, Satvat Jagwani, Mohammad Alizadeh, Hari Balakrishnan, Sanjay Chawla, Mohamed M. Elshrif, Samuel Madden, Amin Sadeghi

Figure 1 for Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding

Figure 2 for Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding

Figure 3 for Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding

Figure 4 for Sat2Graph: Road Graph Extraction through Graph-Tensor Encoding

Abstract:Inferring road graphs from satellite imagery is a challenging computer vision task. Prior solutions fall into two categories: (1) pixel-wise segmentation-based approaches, which predict whether each pixel is on a road, and (2) graph-based approaches, which predict the road graph iteratively. We find that these two approaches have complementary strengths while suffering from their own inherent limitations. In this paper, we propose a new method, Sat2Graph, which combines the advantages of the two prior categories into a unified framework. The key idea in Sat2Graph is a novel encoding scheme, graph-tensor encoding (GTE), which encodes the road graph into a tensor representation. GTE makes it possible to train a simple, non-recurrent, supervised model to predict a rich set of features that capture the graph structure directly from an image. We evaluate Sat2Graph using two large datasets. We find that Sat2Graph surpasses prior methods on two widely used metrics, TOPO and APLS. Furthermore, whereas prior work only infers planar road graphs, our approach is capable of inferring stacked roads (e.g., overpasses), and does so robustly.

* ECCV 2020

Via

Access Paper or Ask Questions