Alert button
Picture for Gedas Bertasius

Gedas Bertasius

Alert button

Siamese Vision Transformers are Scalable Audio-visual Learners

Add code
Bookmark button
Alert button
Mar 28, 2024
Yan-Bo Lin, Gedas Bertasius

Figure 1 for Siamese Vision Transformers are Scalable Audio-visual Learners
Figure 2 for Siamese Vision Transformers are Scalable Audio-visual Learners
Figure 3 for Siamese Vision Transformers are Scalable Audio-visual Learners
Figure 4 for Siamese Vision Transformers are Scalable Audio-visual Learners
Viaarxiv icon

Augmented Reality Demonstrations for Scalable Robot Imitation Learning

Add code
Bookmark button
Alert button
Mar 20, 2024
Yue Yang, Bryce Ikeda, Gedas Bertasius, Daniel Szafir

Figure 1 for Augmented Reality Demonstrations for Scalable Robot Imitation Learning
Figure 2 for Augmented Reality Demonstrations for Scalable Robot Imitation Learning
Figure 3 for Augmented Reality Demonstrations for Scalable Robot Imitation Learning
Figure 4 for Augmented Reality Demonstrations for Scalable Robot Imitation Learning
Viaarxiv icon

DAM: Dynamic Adapter Merging for Continual Video QA Learning

Add code
Bookmark button
Alert button
Mar 13, 2024
Feng Cheng, Ziyang Wang, Yi-Lin Sung, Yan-Bo Lin, Mohit Bansal, Gedas Bertasius

Figure 1 for DAM: Dynamic Adapter Merging for Continual Video QA Learning
Figure 2 for DAM: Dynamic Adapter Merging for Continual Video QA Learning
Figure 3 for DAM: Dynamic Adapter Merging for Continual Video QA Learning
Figure 4 for DAM: Dynamic Adapter Merging for Continual Video QA Learning
Viaarxiv icon

Video ReCap: Recursive Captioning of Hour-Long Videos

Add code
Bookmark button
Alert button
Feb 28, 2024
Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius

Viaarxiv icon

Mementos: A Comprehensive Benchmark for Multimodal Large Language Model Reasoning over Image Sequences

Add code
Bookmark button
Alert button
Jan 25, 2024
Xiyao Wang, Yuhang Zhou, Xiaoyu Liu, Hongjin Lu, Yuancheng Xu, Feihong He, Jaehong Yoon, Taixi Lu, Gedas Bertasius, Mohit Bansal, Huaxiu Yao, Furong Huang

Viaarxiv icon

A Simple LLM Framework for Long-Range Video Question-Answering

Add code
Bookmark button
Alert button
Dec 28, 2023
Ce Zhang, Taixi Lu, Md Mohaiminul Islam, Ziyang Wang, Shoubin Yu, Mohit Bansal, Gedas Bertasius

Viaarxiv icon

RGNet: A Unified Retrieval and Grounding Network for Long Videos

Add code
Bookmark button
Alert button
Dec 11, 2023
Tanveer Hannan, Md Mohaiminul Islam, Thomas Seidl, Gedas Bertasius

Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Bookmark button
Alert button
Nov 30, 2023
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

Add code
Bookmark button
Alert button
Sep 18, 2023
Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal

Figure 1 for Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Figure 2 for Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Figure 3 for Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Figure 4 for Unified Coarse-to-Fine Alignment for Video-Text Retrieval
Viaarxiv icon