Alert button
Picture for Michael S. Ryoo

Michael S. Ryoo

Alert button

VicTR: Video-conditioned Text Representations for Activity Recognition

Add code
Bookmark button
Alert button
Apr 05, 2023
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani, Michael S. Ryoo

Figure 1 for VicTR: Video-conditioned Text Representations for Activity Recognition
Figure 2 for VicTR: Video-conditioned Text Representations for Activity Recognition
Figure 3 for VicTR: Video-conditioned Text Representations for Activity Recognition
Figure 4 for VicTR: Video-conditioned Text Representations for Activity Recognition
Viaarxiv icon

Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors

Add code
Bookmark button
Alert button
Nov 23, 2022
Ryan Burgert, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo

Figure 1 for Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
Figure 2 for Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
Figure 3 for Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
Figure 4 for Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
Viaarxiv icon

Token Turing Machines

Add code
Bookmark button
Alert button
Nov 16, 2022
Michael S. Ryoo, Keerthana Gopalakrishnan, Kumara Kahatapitiya, Ted Xiao, Kanishka Rao, Austin Stone, Yao Lu, Julian Ibarz, Anurag Arnab

Figure 1 for Token Turing Machines
Figure 2 for Token Turing Machines
Figure 3 for Token Turing Machines
Figure 4 for Token Turing Machines
Viaarxiv icon

Grafting Vision Transformers

Add code
Bookmark button
Alert button
Oct 28, 2022
Jongwoo Park, Kumara Kahatapitiya, Donghyun Kim, Shivchander Sudalairaj, Quanfu Fan, Michael S. Ryoo

Figure 1 for Grafting Vision Transformers
Figure 2 for Grafting Vision Transformers
Figure 3 for Grafting Vision Transformers
Figure 4 for Grafting Vision Transformers
Viaarxiv icon

Open-vocabulary Queryable Scene Representations for Real World Planning

Add code
Bookmark button
Alert button
Sep 20, 2022
Boyuan Chen, Fei Xia, Brian Ichter, Kanishka Rao, Keerthana Gopalakrishnan, Michael S. Ryoo, Austin Stone, Daniel Kappler

Figure 1 for Open-vocabulary Queryable Scene Representations for Real World Planning
Figure 2 for Open-vocabulary Queryable Scene Representations for Real World Planning
Figure 3 for Open-vocabulary Queryable Scene Representations for Real World Planning
Figure 4 for Open-vocabulary Queryable Scene Representations for Real World Planning
Viaarxiv icon

Video Question Answering with Iterative Video-Text Co-Tokenization

Add code
Bookmark button
Alert button
Aug 01, 2022
AJ Piergiovanni, Kairo Morton, Weicheng Kuo, Michael S. Ryoo, Anelia Angelova

Figure 1 for Video Question Answering with Iterative Video-Text Co-Tokenization
Figure 2 for Video Question Answering with Iterative Video-Text Co-Tokenization
Figure 3 for Video Question Answering with Iterative Video-Text Co-Tokenization
Figure 4 for Video Question Answering with Iterative Video-Text Co-Tokenization
Viaarxiv icon

Video + CLIP Baseline for Ego4D Long-term Action Anticipation

Add code
Bookmark button
Alert button
Jul 01, 2022
Srijan Das, Michael S. Ryoo

Figure 1 for Video + CLIP Baseline for Ego4D Long-term Action Anticipation
Figure 2 for Video + CLIP Baseline for Ego4D Long-term Action Anticipation
Figure 3 for Video + CLIP Baseline for Ego4D Long-term Action Anticipation
Viaarxiv icon

Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space

Add code
Bookmark button
Alert button
Jun 23, 2022
Jinghuan Shang, Srijan Das, Michael S. Ryoo

Figure 1 for Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Figure 2 for Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Figure 3 for Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Figure 4 for Learning Viewpoint-Agnostic Visual Representations by Recovering Tokens in 3D Space
Viaarxiv icon

Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?

Add code
Bookmark button
Alert button
Jun 23, 2022
Xiang Li, Jinghuan Shang, Srijan Das, Michael S. Ryoo

Figure 1 for Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Figure 2 for Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Figure 3 for Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Figure 4 for Does Self-supervised Learning Really Improve Reinforcement Learning from Pixels?
Viaarxiv icon

STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation

Add code
Bookmark button
Alert button
Dec 07, 2021
Srijan Das, Michael S. Ryoo

Figure 1 for STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation
Figure 2 for STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation
Figure 3 for STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation
Figure 4 for STC-mix: Space, Time, Channel mixing for Self-supervised Video Representation
Viaarxiv icon