Picture for Karttikeya Mangalam

Karttikeya Mangalam

Big Little Transformer Decoder

Add code
Feb 15, 2023
Viaarxiv icon

Reversible Vision Transformers

Add code
Feb 09, 2023
Figure 1 for Reversible Vision Transformers
Figure 2 for Reversible Vision Transformers
Figure 3 for Reversible Vision Transformers
Figure 4 for Reversible Vision Transformers
Viaarxiv icon

Does unsupervised grammar induction need pixels?

Add code
Dec 20, 2022
Viaarxiv icon

Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization

Add code
Nov 25, 2022
Figure 1 for Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Figure 2 for Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Figure 3 for Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Figure 4 for Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Viaarxiv icon

Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022

Add code
Jun 15, 2022
Figure 1 for Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022
Figure 2 for Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022
Figure 3 for Structured Video Tokens @ Ego4D PNR Temporal Localization Challenge 2022
Viaarxiv icon

Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens

Add code
Jun 15, 2022
Figure 1 for Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Figure 2 for Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Figure 3 for Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Figure 4 for Bringing Image Scene Structure to Video via Frame-Clip Consistency of Object Tokens
Viaarxiv icon

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Add code
Jun 02, 2022
Figure 1 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Figure 2 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Figure 3 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Figure 4 for Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Viaarxiv icon

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

Add code
Jan 20, 2022
Figure 1 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Figure 2 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Figure 3 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Figure 4 for MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Viaarxiv icon

Overcoming Mode Collapse with Adaptive Multi Adversarial Training

Add code
Dec 29, 2021
Figure 1 for Overcoming Mode Collapse with Adaptive Multi Adversarial Training
Figure 2 for Overcoming Mode Collapse with Adaptive Multi Adversarial Training
Figure 3 for Overcoming Mode Collapse with Adaptive Multi Adversarial Training
Figure 4 for Overcoming Mode Collapse with Adaptive Multi Adversarial Training
Viaarxiv icon

Improved Multiscale Vision Transformers for Classification and Detection

Add code
Dec 02, 2021
Figure 1 for Improved Multiscale Vision Transformers for Classification and Detection
Figure 2 for Improved Multiscale Vision Transformers for Classification and Detection
Figure 3 for Improved Multiscale Vision Transformers for Classification and Detection
Figure 4 for Improved Multiscale Vision Transformers for Classification and Detection
Viaarxiv icon