Picture for Leonid Karlinsky

Leonid Karlinsky

Teaching VLMs to Localize Specific Objects from In-context Examples

Add code
Nov 20, 2024
Figure 1 for Teaching VLMs to Localize Specific Objects from In-context Examples
Figure 2 for Teaching VLMs to Localize Specific Objects from In-context Examples
Figure 3 for Teaching VLMs to Localize Specific Objects from In-context Examples
Figure 4 for Teaching VLMs to Localize Specific Objects from In-context Examples
Viaarxiv icon

LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content

Add code
Oct 15, 2024
Viaarxiv icon

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models

Add code
Oct 08, 2024
Figure 1 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 2 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 3 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Figure 4 for GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
Viaarxiv icon

Scaling Granite Code Models to 128K Context

Add code
Jul 18, 2024
Viaarxiv icon

DASS: Distilled Audio State Space Models Are Stronger and More Duration-Scalable Learners

Add code
Jul 04, 2024
Viaarxiv icon

Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning

Add code
Jun 21, 2024
Viaarxiv icon

Navigating the Labyrinth: Evaluating and Enhancing LLMs' Ability to Reason About Search Problems

Add code
Jun 18, 2024
Viaarxiv icon

Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

Add code
Jun 17, 2024
Viaarxiv icon

Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation

Add code
Jun 14, 2024
Figure 1 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 2 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 3 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Figure 4 for Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
Viaarxiv icon

Comparison Visual Instruction Tuning

Add code
Jun 13, 2024
Figure 1 for Comparison Visual Instruction Tuning
Figure 2 for Comparison Visual Instruction Tuning
Figure 3 for Comparison Visual Instruction Tuning
Figure 4 for Comparison Visual Instruction Tuning
Viaarxiv icon