Alert button
Picture for Kratarth Goel

Kratarth Goel

Alert button

Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

Jul 12, 2022
Nigamaa Nayakanti, Rami Al-Rfou, Aurick Zhou, Kratarth Goel, Khaled S. Refaat, Benjamin Sapp

Figure 1 for Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Figure 2 for Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Figure 3 for Wayformer: Motion Forecasting via Simple & Efficient Attention Networks
Figure 4 for Wayformer: Motion Forecasting via Simple & Efficient Attention Networks

Motion forecasting for autonomous driving is a challenging task because complex driving scenarios result in a heterogeneous mix of static and dynamic inputs. It is an open problem how best to represent and fuse information about road geometry, lane connectivity, time-varying traffic light state, and history of a dynamic set of agents and their interactions into an effective encoding. To model this diverse set of input features, many approaches proposed to design an equally complex system with a diverse set of modality specific modules. This results in systems that are difficult to scale, extend, or tune in rigorous ways to trade off quality and efficiency. In this paper, we present Wayformer, a family of attention based architectures for motion forecasting that are simple and homogeneous. Wayformer offers a compact model description consisting of an attention based scene encoder and a decoder. In the scene encoder we study the choice of early, late and hierarchical fusion of the input modalities. For each fusion type we explore strategies to tradeoff efficiency and quality via factorized attention or latent query attention. We show that early fusion, despite its simplicity of construction, is not only modality agnostic but also achieves state-of-the-art results on both Waymo Open MotionDataset (WOMD) and Argoverse leaderboards, demonstrating the effectiveness of our design philosophy

Viaarxiv icon

A Recurrent Latent Variable Model for Sequential Data

Apr 06, 2016
Junyoung Chung, Kyle Kastner, Laurent Dinh, Kratarth Goel, Aaron Courville, Yoshua Bengio

Figure 1 for A Recurrent Latent Variable Model for Sequential Data
Figure 2 for A Recurrent Latent Variable Model for Sequential Data
Figure 3 for A Recurrent Latent Variable Model for Sequential Data
Figure 4 for A Recurrent Latent Variable Model for Sequential Data

In this paper, we explore the inclusion of latent random variables into the dynamic hidden state of a recurrent neural network (RNN) by combining elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN)1 can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamic hidden state.

Viaarxiv icon

A Novel Feature Selection and Extraction Technique for Classification

Dec 26, 2014
Kratarth Goel, Raunaq Vohra, Ainesh Bakshi

Figure 1 for A Novel Feature Selection and Extraction Technique for Classification
Figure 2 for A Novel Feature Selection and Extraction Technique for Classification

This paper presents a versatile technique for the purpose of feature selection and extraction - Class Dependent Features (CDFs). We use CDFs to improve the accuracy of classification and at the same time control computational expense by tackling the curse of dimensionality. In order to demonstrate the generality of this technique, it is applied to handwritten digit recognition and text categorization.

* IEEE Xplore, Proceedings of IEEE SMC 2014, pages 4033 - 4034  
* 2 pages, 2 tables, published at IEEE SMC 2014 
Viaarxiv icon

Polyphonic Music Generation by Modeling Temporal Dependencies Using a RNN-DBN

Dec 26, 2014
Kratarth Goel, Raunaq Vohra, J. K. Sahoo

Figure 1 for Polyphonic Music Generation by Modeling Temporal Dependencies Using a RNN-DBN
Figure 2 for Polyphonic Music Generation by Modeling Temporal Dependencies Using a RNN-DBN

In this paper, we propose a generic technique to model temporal dependencies and sequences using a combination of a recurrent neural network and a Deep Belief Network. Our technique, RNN-DBN, is an amalgamation of the memory state of the RNN that allows it to provide temporal information and a multi-layer DBN that helps in high level representation of the data. This makes RNN-DBNs ideal for sequence generation. Further, the use of a DBN in conjunction with the RNN makes this model capable of significantly more complex data representation than an RBM. We apply this technique to the task of polyphonic music generation.

* Lecture Notes in Computer Science Volume 8681, 2014, pp 217-224  
* 8 pages, A4, 1 figure, 1 table, ICANN 2014 oral presentation. arXiv admin note: text overlap with arXiv:1206.6392 by other authors 
Viaarxiv icon

Learning Temporal Dependencies in Data Using a DBN-BLSTM

Dec 23, 2014
Kratarth Goel, Raunaq Vohra

Figure 1 for Learning Temporal Dependencies in Data Using a DBN-BLSTM
Figure 2 for Learning Temporal Dependencies in Data Using a DBN-BLSTM
Figure 3 for Learning Temporal Dependencies in Data Using a DBN-BLSTM

Since the advent of deep learning, it has been used to solve various problems using many different architectures. The application of such deep architectures to auditory data is also not uncommon. However, these architectures do not always adequately consider the temporal dependencies in data. We thus propose a new generic architecture called the Deep Belief Network - Bidirectional Long Short-Term Memory (DBN-BLSTM) network that models sequences by keeping track of the temporal information while enabling deep representations in the data. We demonstrate this new architecture by applying it to the task of music generation and obtain state-of-the-art results.

* 6 pages, 2 figures, 1 table, ICLR 2015 conference track submission under review 
Viaarxiv icon