Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Apratim Bhattacharyya

Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers

Jun 22, 2021

Apratim Bhattacharyya, Daniel Olmeda Reino, Mario Fritz, Bernt Schiele

Figure 1 for Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers

Figure 2 for Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers

Figure 3 for Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers

Figure 4 for Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers

Abstract:Accurate prediction of pedestrian and bicyclist paths is integral to the development of reliable autonomous vehicles in dense urban environments. The interactions between vehicle and pedestrian or bicyclist have a significant impact on the trajectories of traffic participants e.g. stopping or turning to avoid collisions. Although recent datasets and trajectory prediction approaches have fostered the development of autonomous vehicles yet the amount of vehicle-pedestrian (bicyclist) interactions modeled are sparse. In this work, we propose Euro-PVI, a dataset of pedestrian and bicyclist trajectories. In particular, our dataset caters more diverse and complex interactions in dense urban scenarios compared to the existing datasets. To address the challenges in predicting future trajectories with dense interactions, we develop a joint inference model that learns an expressive multi-modal shared latent space across agents in the urban scene. This enables our Joint-$\beta$-cVAE approach to better model the distribution of future trajectories. We achieve state of the art results on the nuScenes and Euro-PVI datasets demonstrating the importance of capturing interactions between ego-vehicle and pedestrians (bicyclists) for accurate predictions.

* To appear at CVPR 2021

Via

Access Paper or Ask Questions

Haar Wavelet based Block Autoregressive Flows for Trajectories

Sep 21, 2020

Apratim Bhattacharyya, Christoph-Nikolas Straehle, Mario Fritz, Bernt Schiele

Figure 1 for Haar Wavelet based Block Autoregressive Flows for Trajectories

Figure 2 for Haar Wavelet based Block Autoregressive Flows for Trajectories

Figure 3 for Haar Wavelet based Block Autoregressive Flows for Trajectories

Figure 4 for Haar Wavelet based Block Autoregressive Flows for Trajectories

Abstract:Prediction of trajectories such as that of pedestrians is crucial to the performance of autonomous agents. While previous works have leveraged conditional generative models like GANs and VAEs for learning the likely future trajectories, accurately modeling the dependency structure of these multimodal distributions, particularly over long time horizons remains challenging. Normalizing flow based generative models can model complex distributions admitting exact inference. These include variants with split coupling invertible transformations that are easier to parallelize compared to their autoregressive counterparts. To this end, we introduce a novel Haar wavelet based block autoregressive model leveraging split couplings, conditioned on coarse trajectories obtained from Haar wavelet based transformations at different levels of granularity. This yields an exact inference method that models trajectories at different spatio-temporal resolutions in a hierarchical manner. We illustrate the advantages of our approach for generating diverse and accurate trajectories on two real-world datasets - Stanford Drone and Intersection Drone.

* German Conference on Pattern Recognition, 2020 (oral)

Via

Access Paper or Ask Questions

Normalizing Flows with Multi-Scale Autoregressive Priors

Apr 08, 2020

Shweta Mahajan, Apratim Bhattacharyya, Mario Fritz, Bernt Schiele, Stefan Roth

Figure 1 for Normalizing Flows with Multi-Scale Autoregressive Priors

Figure 2 for Normalizing Flows with Multi-Scale Autoregressive Priors

Figure 3 for Normalizing Flows with Multi-Scale Autoregressive Priors

Figure 4 for Normalizing Flows with Multi-Scale Autoregressive Priors

Abstract:Flow-based generative models are an important class of exact inference models that admit efficient inference and sampling for image synthesis. Owing to the efficiency constraints on the design of the flow layers, e.g. split coupling flow layers in which approximately half the pixels do not undergo further transformations, they have limited expressiveness for modeling long-range data dependencies compared to autoregressive models that rely on conditional pixel-wise generation. In this work, we improve the representational power of flow-based models by introducing channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR). Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data. The resulting model achieves state-of-the-art density estimation results on MNIST, CIFAR-10, and ImageNet. Furthermore, we show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.

* To appear in CVPR 2020

Via

Access Paper or Ask Questions

"Best-of-Many-Samples" Distribution Matching

Sep 27, 2019

Apratim Bhattacharyya, Mario Fritz, Bernt Schiele

Figure 1 for "Best-of-Many-Samples" Distribution Matching

Figure 2 for "Best-of-Many-Samples" Distribution Matching

Figure 3 for "Best-of-Many-Samples" Distribution Matching

Figure 4 for "Best-of-Many-Samples" Distribution Matching

Abstract:Generative Adversarial Networks (GANs) can achieve state-of-the-art sample quality in generative modelling tasks but suffer from the mode collapse problem. Variational Autoencoders (VAE) on the other hand explicitly maximize a reconstruction-based data log-likelihood forcing it to cover all modes, but suffer from poorer sample quality. Recent works have proposed hybrid VAE-GAN frameworks which integrate a GAN-based synthetic likelihood to the VAE objective to address both the mode collapse and sample quality issues, with limited success. This is because the VAE objective forces a trade-off between the data log-likelihood and divergence to the latent prior. The synthetic likelihood ratio term also shows instability during training. We propose a novel objective with a "Best-of-Many-Samples" reconstruction cost and a stable direct estimate of the synthetic likelihood. This enables our hybrid VAE-GAN framework to achieve high data log-likelihood and low divergence to the latent prior at the same time and shows significant improvement over both hybrid VAE-GANS and plain GANs in mode coverage and quality.

Via

Access Paper or Ask Questions

Conditional Flow Variational Autoencoders for Structured Sequence Prediction

Aug 24, 2019

Apratim Bhattacharyya, Michael Hanselmann, Mario Fritz, Bernt Schiele, Christoph-Nikolas Straehle

Figure 1 for Conditional Flow Variational Autoencoders for Structured Sequence Prediction

Figure 2 for Conditional Flow Variational Autoencoders for Structured Sequence Prediction

Figure 3 for Conditional Flow Variational Autoencoders for Structured Sequence Prediction

Figure 4 for Conditional Flow Variational Autoencoders for Structured Sequence Prediction

Abstract:Prediction of future states of the environment and interacting agents is a key competence required for autonomous agents to operate successfully in the real world. Prior work for structured sequence prediction based on latent variable models imposes a uni-modal standard Gaussian prior on the latent variables. This induces a strong model bias which makes it challenging to fully capture the multi-modality of the distribution of the future states. In this work, we introduce Conditional Flow Variational Autoencoders which uses our novel conditional normalizing flow based prior. We show that using our novel complex multi-modal conditional prior we can capture complex multi-modal conditional distributions. Furthermore, we study for the first time latent variable collapse with normalizing flows and propose solutions to prevent such failure cases. Our experiments on three multi-modal structured sequence prediction datasets -- MNIST Sequences, Stanford Drone and HighD -- show that the proposed method obtains state of art results across different evaluation metrics.

Via

Access Paper or Ask Questions

Accurate and Diverse Sampling of Sequences based on a "Best of Many" Sample Objective

Oct 15, 2018

Apratim Bhattacharyya, Bernt Schiele, Mario Fritz

Figure 1 for Accurate and Diverse Sampling of Sequences based on a "Best of Many" Sample Objective

Figure 2 for Accurate and Diverse Sampling of Sequences based on a "Best of Many" Sample Objective

Figure 3 for Accurate and Diverse Sampling of Sequences based on a "Best of Many" Sample Objective

Figure 4 for Accurate and Diverse Sampling of Sequences based on a "Best of Many" Sample Objective

Abstract:For autonomous agents to successfully operate in the real world, anticipation of future events and states of their environment is a key competence. This problem has been formalized as a sequence extrapolation problem, where a number of observations are used to predict the sequence into the future. Real-world scenarios demand a model of uncertainty of such predictions, as predictions become increasingly uncertain -- in particular on long time horizons. While impressive results have been shown on point estimates, scenarios that induce multi-modal distributions over future sequences remain challenging. Our work addresses these challenges in a Gaussian Latent Variable model for sequence prediction. Our core contribution is a "Best of Many" sample objective that leads to more accurate and more diverse predictions that better capture the true variations in real-world sequence data. Beyond our analysis of improved model fit, our models also empirically outperform prior work on three diverse tasks ranging from traffic scenes to weather data.

* Added additional references and baselines. (Appeared in CVPR 2018)

Via

Access Paper or Ask Questions

Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

Oct 02, 2018

Apratim Bhattacharyya, Mario Fritz, Bernt Schiele

Figure 1 for Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

Figure 2 for Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

Figure 3 for Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

Figure 4 for Bayesian Prediction of Future Street Scenes using Synthetic Likelihoods

Abstract:For autonomous agents to successfully operate in the real world, the ability to anticipate future scene states is a key competence. In real-world scenarios, future states become increasingly uncertain and multi-modal, particularly on long time horizons. Dropout based Bayesian inference provides a computationally tractable, theoretically well grounded approach to learn likely hypotheses/models to deal with uncertain futures and make predictions that correspond well to observations -- are well calibrated. However, it turns out that such approaches fall short to capture complex real-world scenes, even falling behind in accuracy when compared to the plain deterministic approaches. This is because the used log-likelihood estimate discourages diversity. In this work, we propose a novel Bayesian formulation for anticipating future scene states which leverages synthetic likelihoods that encourage the learning of diverse models to accurately capture the multi-modal nature of future scene states. We show that our approach achieves accurate state-of-the-art predictions and calibrated probabilities through extensive experiments for scene anticipation on Cityscapes dataset. Moreover, we show that our approach generalizes across diverse tasks such as digit generation and precipitation forecasting.

* arXiv admin note: substantial text overlap with arXiv:1806.06939

Via

Access Paper or Ask Questions

Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization

Sep 28, 2018

Apratim Bhattacharyya, Mario Fritz, Bernt Schiele

Figure 1 for Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization

Figure 2 for Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization

Figure 3 for Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization

Figure 4 for Bayesian Prediction of Future Street Scenes through Importance Sampling based Optimization

Abstract:For autonomous agents to successfully operate in the real world, anticipation of future events and states of their environment is a key competence. This problem can be formalized as a sequence prediction problem, where a number of observations are used to predict the sequence into the future. However, real-world scenarios demand a model of uncertainty of such predictions, as future states become increasingly uncertain and multi-modal -- in particular on long time horizons. This makes modelling and learning challenging. We cast state of the art semantic segmentation and future prediction models based on deep learning into a Bayesian formulation that in turn allows for a full Bayesian treatment of the prediction problem. We present a new sampling scheme for this model that draws from the success of variational autoencoders by incorporating a recognition network. In the experiments we show that our model outperforms prior work in accuracy of the predicted segmentation and provides calibrated probabilities that also better capture the multi-modal aspects of possible future states of street scenes.

* The objective in (8) allows for trivial solutions e.g. the prior

Via

Access Paper or Ask Questions

Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty

Jun 20, 2018

Apratim Bhattacharyya, Mario Fritz, Bernt Schiele

Figure 1 for Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty

Figure 2 for Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty

Figure 3 for Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty

Figure 4 for Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty

Abstract:Progress towards advanced systems for assisted and autonomous driving is leveraging recent advances in recognition and segmentation methods. Yet, we are still facing challenges in bringing reliable driving to inner cities, as those are composed of highly dynamic scenes observed from a moving platform at considerable speeds. Anticipation becomes a key element in order to react timely and prevent accidents. In this paper we argue that it is necessary to predict at least 1 second and we thus propose a new model that jointly predicts ego motion and people trajectories over such large time horizons. We pay particular attention to modeling the uncertainty of our estimates arising from the non-deterministic nature of natural traffic scenes. Our experimental results show that it is indeed possible to predict people trajectories at the desired time horizons and that our uncertainty estimates are informative of the prediction error. We also show that both sequence modeling of trajectories as well as our novel method of long term odometry prediction are essential for best performance.

* CVPR 2018

Via

Access Paper or Ask Questions

Long-Term Image Boundary Prediction

Nov 23, 2017

Apratim Bhattacharyya, Mateusz Malinowski, Bernt Schiele, Mario Fritz

Figure 1 for Long-Term Image Boundary Prediction

Figure 2 for Long-Term Image Boundary Prediction

Figure 3 for Long-Term Image Boundary Prediction

Figure 4 for Long-Term Image Boundary Prediction

Abstract:Boundary estimation in images and videos has been a very active topic of research, and organizing visual information into boundaries and segments is believed to be a corner stone of visual perception. While prior work has focused on estimating boundaries for observed frames, our work aims at predicting boundaries of future unobserved frames. This requires our model to learn about the fate of boundaries and corresponding motion patterns -- including a notion of "intuitive physics". We experiment on natural video sequences along with synthetic sequences with deterministic physics-based and agent-based motions. While not being our primary goal, we also show that fusion of RGB and boundary prediction leads to improved RGB predictions.

* Accepted in the AAAI Conference for Artificial Intelligence, 2018

Via

Access Paper or Ask Questions