Alert button
Picture for Shraman Pramanick

Shraman Pramanick

Alert button

Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model

Add code
Bookmark button
Alert button
Dec 19, 2023
Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi

Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Bookmark button
Alert button
Nov 30, 2023
Kristen Grauman, Andrew Westbury, Lorenzo Torresani, Kris Kitani, Jitendra Malik, Triantafyllos Afouras, Kumar Ashutosh, Vijay Baiyya, Siddhant Bansal, Bikram Boote, Eugene Byrne, Zach Chavis, Joya Chen, Feng Cheng, Fu-Jen Chu, Sean Crane, Avijit Dasgupta, Jing Dong, Maria Escobar, Cristhian Forigua, Abrham Gebreselasie, Sanjay Haresh, Jing Huang, Md Mohaiminul Islam, Suyog Jain, Rawal Khirodkar, Devansh Kukreja, Kevin J Liang, Jia-Wei Liu, Sagnik Majumder, Yongsen Mao, Miguel Martin, Effrosyni Mavroudi, Tushar Nagarajan, Francesco Ragusa, Santhosh Kumar Ramakrishnan, Luigi Seminara, Arjun Somayazulu, Yale Song, Shan Su, Zihui Xue, Edward Zhang, Jinxu Zhang, Angela Castillo, Changan Chen, Xinzhu Fu, Ryosuke Furuta, Cristina Gonzalez, Prince Gupta, Jiabo Hu, Yifei Huang, Yiming Huang, Weslie Khoo, Anush Kumar, Robert Kuo, Sach Lakhavani, Miao Liu, Mi Luo, Zhengyi Luo, Brighid Meredith, Austin Miller, Oluwatumininu Oguntola, Xiaqing Pan, Penny Peng, Shraman Pramanick, Merey Ramazanova, Fiona Ryan, Wei Shan, Kiran Somasundaram, Chenan Song, Audrey Southerland, Masatoshi Tateno, Huiyu Wang, Yuchen Wang, Takuma Yagi, Mingfei Yan, Xitong Yang, Zecheng Yu, Shengxin Cindy Zha, Chen Zhao, Ziwei Zhao, Zhifan Zhu, Jeff Zhuo, Pablo Arbelaez, Gedas Bertasius, David Crandall, Dima Damen, Jakob Engel, Giovanni Maria Farinella, Antonino Furnari, Bernard Ghanem, Judy Hoffman, C. V. Jawahar, Richard Newcombe, Hyun Soo Park, James M. Rehg, Yoichi Sato, Manolis Savva, Jianbo Shi, Mike Zheng Shou, Michael Wray

Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

UniVTG: Towards Unified Video-Language Temporal Grounding

Add code
Bookmark button
Alert button
Aug 18, 2023
Kevin Qinghong Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex Jinpeng Wang, Rui Yan, Mike Zheng Shou

Figure 1 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 2 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 3 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 4 for UniVTG: Towards Unified Video-Language Temporal Grounding
Viaarxiv icon

EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone

Add code
Bookmark button
Alert button
Jul 11, 2023
Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang

Figure 1 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Figure 2 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Figure 3 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Figure 4 for EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Viaarxiv icon

VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment

Add code
Bookmark button
Alert button
Oct 09, 2022
Shraman Pramanick, Li Jing, Sayan Nag, Jiachen Zhu, Hardik Shah, Yann LeCun, Rama Chellappa

Figure 1 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 2 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 3 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Figure 4 for VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Viaarxiv icon

Where in the World is this Image? Transformer-based Geo-localization in the Wild

Add code
Bookmark button
Alert button
Apr 29, 2022
Shraman Pramanick, Ewa M. Nowara, Joshua Gleason, Carlos D. Castillo, Rama Chellappa

Figure 1 for Where in the World is this Image? Transformer-based Geo-localization in the Wild
Figure 2 for Where in the World is this Image? Transformer-based Geo-localization in the Wild
Figure 3 for Where in the World is this Image? Transformer-based Geo-localization in the Wild
Figure 4 for Where in the World is this Image? Transformer-based Geo-localization in the Wild
Viaarxiv icon

Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection

Add code
Bookmark button
Alert button
Oct 21, 2021
Shraman Pramanick, Aniket Roy, Vishal M. Patel

Figure 1 for Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection
Figure 2 for Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection
Figure 3 for Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection
Figure 4 for Multimodal Learning using Optimal Transport for Sarcasm and Humor Detection
Viaarxiv icon

Detecting Harmful Memes and Their Targets

Add code
Bookmark button
Alert button
Sep 24, 2021
Shraman Pramanick, Dimitar Dimitrov, Rituparna Mukherjee, Shivam Sharma, Md. Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

Figure 1 for Detecting Harmful Memes and Their Targets
Figure 2 for Detecting Harmful Memes and Their Targets
Figure 3 for Detecting Harmful Memes and Their Targets
Figure 4 for Detecting Harmful Memes and Their Targets
Viaarxiv icon

MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets

Add code
Bookmark button
Alert button
Sep 22, 2021
Shraman Pramanick, Shivam Sharma, Dimitar Dimitrov, Md Shad Akhtar, Preslav Nakov, Tanmoy Chakraborty

Figure 1 for MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Figure 2 for MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Figure 3 for MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Figure 4 for MOMENTA: A Multimodal Framework for Detecting Harmful Memes and Their Targets
Viaarxiv icon