Alert button
Picture for Karan Sikka

Karan Sikka

Alert button

A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

Add code
Bookmark button
Alert button
Nov 30, 2023
Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran

Viaarxiv icon

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

Add code
Bookmark button
Alert button
Nov 16, 2023
Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

Figure 1 for DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Figure 2 for DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Figure 3 for DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Figure 4 for DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Viaarxiv icon

Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning

Add code
Bookmark button
Alert button
Oct 16, 2023
Anirudh Som, Karan Sikka, Helen Gent, Ajay Divakaran, Andreas Kathol, Dimitra Vergyri

Viaarxiv icon

SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments

Add code
Bookmark button
Alert button
Sep 22, 2023
Abhinav Rajvanshi, Karan Sikka, Xiao Lin, Bhoram Lee, Han-Pang Chiu, Alvaro Velasquez

Figure 1 for SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments
Figure 2 for SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments
Figure 3 for SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments
Figure 4 for SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments
Viaarxiv icon

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

Add code
Bookmark button
Alert button
Sep 08, 2023
Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

Figure 1 for Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Figure 2 for Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Figure 3 for Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Figure 4 for Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models
Viaarxiv icon

TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models

Add code
Bookmark button
Alert button
Aug 07, 2023
Indranil Sur, Karan Sikka, Matthew Walmer, Kaushik Koneripalli, Anirban Roy, Xiao Lin, Ajay Divakaran, Susmit Jha

Figure 1 for TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
Figure 2 for TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
Figure 3 for TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
Figure 4 for TIJO: Trigger Inversion with Joint Optimization for Defending Multimodal Backdoored Models
Viaarxiv icon

Multilingual Content Moderation: A Case Study on Reddit

Add code
Bookmark button
Alert button
Feb 19, 2023
Meng Ye, Karan Sikka, Katherine Atwell, Sabit Hassan, Ajay Divakaran, Malihe Alikhani

Figure 1 for Multilingual Content Moderation: A Case Study on Reddit
Figure 2 for Multilingual Content Moderation: A Case Study on Reddit
Figure 3 for Multilingual Content Moderation: A Case Study on Reddit
Figure 4 for Multilingual Content Moderation: A Case Study on Reddit
Viaarxiv icon

Dual-Key Multimodal Backdoors for Visual Question Answering

Add code
Bookmark button
Alert button
Dec 14, 2021
Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, Susmit Jha

Figure 1 for Dual-Key Multimodal Backdoors for Visual Question Answering
Figure 2 for Dual-Key Multimodal Backdoors for Visual Question Answering
Figure 3 for Dual-Key Multimodal Backdoors for Visual Question Answering
Figure 4 for Dual-Key Multimodal Backdoors for Visual Question Answering
Viaarxiv icon