Picture for Philip Torr

Philip Torr

CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

Add code
Jul 01, 2024
Figure 1 for CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Figure 2 for CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Figure 3 for CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Figure 4 for CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Viaarxiv icon

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Add code
Jun 24, 2024
Figure 1 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 2 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 3 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Figure 4 for PVUW 2024 Challenge on Complex Video Understanding: Methods and Results
Viaarxiv icon

Model Merging and Safety Alignment: One Bad Model Spoils the Bunch

Add code
Jun 20, 2024
Viaarxiv icon

Localizing Events in Videos with Multimodal Queries

Add code
Jun 14, 2024
Figure 1 for Localizing Events in Videos with Multimodal Queries
Figure 2 for Localizing Events in Videos with Multimodal Queries
Figure 3 for Localizing Events in Videos with Multimodal Queries
Figure 4 for Localizing Events in Videos with Multimodal Queries
Viaarxiv icon

Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation

Add code
Jun 07, 2024
Figure 1 for Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Figure 2 for Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Figure 3 for Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Figure 4 for Towards Interpretable Deep Local Learning with Successive Gradient Reconciliation
Viaarxiv icon

Learning Visual Prompts for Guiding the Attention of Vision Transformers

Add code
Jun 05, 2024
Figure 1 for Learning Visual Prompts for Guiding the Attention of Vision Transformers
Figure 2 for Learning Visual Prompts for Guiding the Attention of Vision Transformers
Figure 3 for Learning Visual Prompts for Guiding the Attention of Vision Transformers
Figure 4 for Learning Visual Prompts for Guiding the Attention of Vision Transformers
Viaarxiv icon

HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits

Add code
Jun 05, 2024
Figure 1 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 2 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 3 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Figure 4 for HelloFresh: LLM Evaluations on Streams of Real-World Human Editorial Actions across X Community Notes and Wikipedia edits
Viaarxiv icon

Direct3D: Scalable Image-to-3D Generation via 3D Latent Diffusion Transformer

Add code
May 23, 2024
Viaarxiv icon

Select to Perfect: Imitating desired behavior from large multi-agent data

Add code
May 06, 2024
Viaarxiv icon

Energy-Latency Manipulation of Multi-modal Large Language Models via Verbose Samples

Add code
Apr 25, 2024
Viaarxiv icon