Picture for Ranjay Krishna

Ranjay Krishna

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Add code
Aug 01, 2024
Figure 1 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 2 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 3 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 4 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Viaarxiv icon

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Add code
Jul 25, 2024
Figure 1 for Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Figure 2 for Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Figure 3 for Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Figure 4 for Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Viaarxiv icon

Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps

Add code
Jul 09, 2024
Figure 1 for Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Figure 2 for Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Figure 3 for Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Figure 4 for Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps
Viaarxiv icon

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Add code
Jul 09, 2024
Figure 1 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Figure 2 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Figure 3 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Figure 4 for Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Viaarxiv icon

Manipulate-Anything: Automating Real-World Robots using Vision-Language Models

Add code
Jun 27, 2024
Figure 1 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 2 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 3 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Figure 4 for Manipulate-Anything: Automating Real-World Robots using Vision-Language Models
Viaarxiv icon

Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Add code
Jun 23, 2024
Figure 1 for Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Figure 2 for Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Figure 3 for Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Figure 4 for Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization
Viaarxiv icon

Task Me Anything

Add code
Jun 17, 2024
Viaarxiv icon

RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics

Add code
Jun 15, 2024
Viaarxiv icon

Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models

Add code
Jun 13, 2024
Figure 1 for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Figure 2 for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Figure 3 for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Figure 4 for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Viaarxiv icon

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Add code
Jun 07, 2024
Figure 1 for The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Figure 2 for The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Figure 3 for The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Figure 4 for The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better
Viaarxiv icon