Picture for Juan-Manuel Perez-Rua

Juan-Manuel Perez-Rua

HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming

Add code
Dec 24, 2025
Figure 1 for HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
Figure 2 for HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
Figure 3 for HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
Figure 4 for HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming
Viaarxiv icon

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Add code
Jan 05, 2024
Figure 1 for Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks
Figure 2 for Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks
Figure 3 for Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks
Figure 4 for Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks
Viaarxiv icon

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Add code
Dec 07, 2023
Figure 1 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 2 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 3 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 4 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Viaarxiv icon

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Add code
Oct 09, 2023
Figure 1 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 2 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 3 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 4 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Viaarxiv icon

Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation

Add code
Nov 27, 2022
Figure 1 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Figure 2 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Figure 3 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Figure 4 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Viaarxiv icon

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

Add code
Nov 18, 2022
Figure 1 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 2 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 3 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 4 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Viaarxiv icon

Negative Frames Matter in Egocentric Visual Query 2D Localization

Add code
Aug 03, 2022
Figure 1 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 2 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 3 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 4 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Viaarxiv icon

SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

Add code
Oct 06, 2021
Figure 1 for SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021
Viaarxiv icon

TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Add code
Jun 21, 2021
Figure 1 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Figure 2 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Figure 3 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Figure 4 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Viaarxiv icon

Space-time Mixing Attention for Video Transformer

Add code
Jun 11, 2021
Figure 1 for Space-time Mixing Attention for Video Transformer
Figure 2 for Space-time Mixing Attention for Video Transformer
Figure 3 for Space-time Mixing Attention for Video Transformer
Figure 4 for Space-time Mixing Attention for Video Transformer
Viaarxiv icon