Picture for Yusuf Aytar

Yusuf Aytar

Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models

Add code
Jun 13, 2024
Viaarxiv icon

FlexCap: Generating Rich, Localized, and Flexible Captions in Images

Add code
Mar 18, 2024
Figure 1 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 2 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 3 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Figure 4 for FlexCap: Generating Rich, Localized, and Flexible Captions in Images
Viaarxiv icon

Genie: Generative Interactive Environments

Add code
Feb 23, 2024
Viaarxiv icon

Learning from One Continuous Video Stream

Add code
Dec 01, 2023
Figure 1 for Learning from One Continuous Video Stream
Figure 2 for Learning from One Continuous Video Stream
Figure 3 for Learning from One Continuous Video Stream
Figure 4 for Learning from One Continuous Video Stream
Viaarxiv icon

RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

Add code
Aug 31, 2023
Figure 1 for RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Figure 2 for RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Figure 3 for RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Figure 4 for RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation
Viaarxiv icon

RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Add code
Jun 20, 2023
Viaarxiv icon

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Add code
Jun 14, 2023
Figure 1 for TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Figure 2 for TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Figure 3 for TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Figure 4 for TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement
Viaarxiv icon

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Add code
May 23, 2023
Figure 1 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 2 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 3 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Figure 4 for Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viaarxiv icon

Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation

Add code
Apr 13, 2023
Figure 1 for Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Figure 2 for Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Figure 3 for Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Figure 4 for Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation
Viaarxiv icon

TAP-Vid: A Benchmark for Tracking Any Point in a Video

Add code
Nov 07, 2022
Figure 1 for TAP-Vid: A Benchmark for Tracking Any Point in a Video
Figure 2 for TAP-Vid: A Benchmark for Tracking Any Point in a Video
Figure 3 for TAP-Vid: A Benchmark for Tracking Any Point in a Video
Figure 4 for TAP-Vid: A Benchmark for Tracking Any Point in a Video
Viaarxiv icon