Picture for Josef Sivic

Josef Sivic

Revealing data leakage in protein interaction benchmarks

Add code
Apr 16, 2024
Viaarxiv icon

POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images

Add code
Jan 17, 2024
Viaarxiv icon

GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos

Add code
Dec 12, 2023
Figure 1 for GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Figure 2 for GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Figure 3 for GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Figure 4 for GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Viaarxiv icon

Customizing Motion in Text-to-Video Diffusion Models

Add code
Dec 07, 2023
Figure 1 for Customizing Motion in Text-to-Video Diffusion Models
Figure 2 for Customizing Motion in Text-to-Video Diffusion Models
Figure 3 for Customizing Motion in Text-to-Video Diffusion Models
Figure 4 for Customizing Motion in Text-to-Video Diffusion Models
Viaarxiv icon

Visually Guided Model Predictive Robot Control via 6D Object Pose Localization and Tracking

Nov 09, 2023
Viaarxiv icon

Learning to design protein-protein interactions with enhanced generalization

Add code
Oct 27, 2023
Viaarxiv icon

VidChapters-7M: Video Chapters at Scale

Add code
Sep 25, 2023
Figure 1 for VidChapters-7M: Video Chapters at Scale
Figure 2 for VidChapters-7M: Video Chapters at Scale
Figure 3 for VidChapters-7M: Video Chapters at Scale
Figure 4 for VidChapters-7M: Video Chapters at Scale
Viaarxiv icon

Meta-Personalizing Vision-Language Models to Find Named Instances in Video

Add code
Jun 16, 2023
Figure 1 for Meta-Personalizing Vision-Language Models to Find Named Instances in Video
Figure 2 for Meta-Personalizing Vision-Language Models to Find Named Instances in Video
Figure 3 for Meta-Personalizing Vision-Language Models to Find Named Instances in Video
Figure 4 for Meta-Personalizing Vision-Language Models to Find Named Instances in Video
Viaarxiv icon

Language-Guided Music Recommendation for Video via Prompt Analogies

Jun 15, 2023
Figure 1 for Language-Guided Music Recommendation for Video via Prompt Analogies
Figure 2 for Language-Guided Music Recommendation for Video via Prompt Analogies
Figure 3 for Language-Guided Music Recommendation for Video via Prompt Analogies
Figure 4 for Language-Guided Music Recommendation for Video via Prompt Analogies
Viaarxiv icon

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

Add code
Mar 21, 2023
Figure 1 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 2 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 3 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Figure 4 for Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Viaarxiv icon