Alert button
Picture for Lorenzo Torresani

Lorenzo Torresani

Alert button

Video ReCap: Recursive Captioning of Hour-Long Videos

Feb 28, 2024
Md Mohaiminul Islam, Ngan Ho, Xitong Yang, Tushar Nagarajan, Lorenzo Torresani, Gedas Bertasius

Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Nov 30, 2023
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Multiscale Video Pretraining for Long-Term Activity Forecasting

Jul 24, 2023
Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani

Figure 1 for Multiscale Video Pretraining for Long-Term Activity Forecasting
Figure 2 for Multiscale Video Pretraining for Long-Term Activity Forecasting
Figure 3 for Multiscale Video Pretraining for Long-Term Activity Forecasting
Figure 4 for Multiscale Video Pretraining for Long-Term Activity Forecasting
Viaarxiv icon

Learning to Ground Instructional Articles in Videos through Narrations

Jun 06, 2023
Effrosyni Mavroudi, Triantafyllos Afouras, Lorenzo Torresani

Figure 1 for Learning to Ground Instructional Articles in Videos through Narrations
Figure 2 for Learning to Ground Instructional Articles in Videos through Narrations
Figure 3 for Learning to Ground Instructional Articles in Videos through Narrations
Figure 4 for Learning to Ground Instructional Articles in Videos through Narrations
Viaarxiv icon

Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision

Mar 09, 2023
Tarun Kalluri, Weiyao Wang, Heng Wang, Manmohan Chandraker, Lorenzo Torresani, Du Tran

Figure 1 for Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
Figure 2 for Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
Figure 3 for Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
Figure 4 for Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision
Viaarxiv icon

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

Feb 16, 2023
Raghav Goyal, Effrosyni Mavroudi, Xitong Yang, Sainbayar Sukhbaatar, Leonid Sigal, Matt Feiszli, Lorenzo Torresani, Du Tran

Figure 1 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 2 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 3 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Figure 4 for MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Viaarxiv icon

Egocentric Video Task Translation @ Ego4D Challenge 2022

Feb 03, 2023
Zihui Xue, Yale Song, Kristen Grauman, Lorenzo Torresani

Figure 1 for Egocentric Video Task Translation @ Ego4D Challenge 2022
Figure 2 for Egocentric Video Task Translation @ Ego4D Challenge 2022
Figure 3 for Egocentric Video Task Translation @ Ego4D Challenge 2022
Figure 4 for Egocentric Video Task Translation @ Ego4D Challenge 2022
Viaarxiv icon

HierVL: Learning Hierarchical Video-Language Embeddings

Jan 05, 2023
Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman

Figure 1 for HierVL: Learning Hierarchical Video-Language Embeddings
Figure 2 for HierVL: Learning Hierarchical Video-Language Embeddings
Figure 3 for HierVL: Learning Hierarchical Video-Language Embeddings
Figure 4 for HierVL: Learning Hierarchical Video-Language Embeddings
Viaarxiv icon

What You Say Is What You Show: Visual Narration Detection in Instructional Videos

Jan 05, 2023
Kumar Ashutosh, Rohit Girdhar, Lorenzo Torresani, Kristen Grauman

Figure 1 for What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Figure 2 for What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Figure 3 for What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Figure 4 for What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Viaarxiv icon