Alert button
Picture for Juan-Manuel Perez-Rua

Juan-Manuel Perez-Rua

Alert button

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Jan 05, 2024
Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang

Viaarxiv icon

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Dec 07, 2023
Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua

Figure 1 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 2 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 3 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Figure 4 for GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation
Viaarxiv icon

FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing

Oct 09, 2023
Yuren Cong, Mengmeng Xu, Christian Simon, Shoufa Chen, Jiawei Ren, Yanping Xie, Juan-Manuel Perez-Rua, Bodo Rosenhahn, Tao Xiang, Sen He

Figure 1 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 2 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 3 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Figure 4 for FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Viaarxiv icon

Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation

Nov 27, 2022
Sauradip Nag, Mengmeng Xu, Xiatian Zhu, Juan-Manuel Perez-Rua, Bernard Ghanem, Yi-Zhe Song, Tao Xiang

Figure 1 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Figure 2 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Figure 3 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Figure 4 for Multi-Modal Few-Shot Temporal Action Detection via Vision-Language Meta-Adaptation
Viaarxiv icon

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization

Nov 18, 2022
Mengmeng Xu, Yanghao Li, Cheng-Yang Fu, Bernard Ghanem, Tao Xiang, Juan-Manuel Perez-Rua

Figure 1 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 2 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 3 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Figure 4 for Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization
Viaarxiv icon

Negative Frames Matter in Egocentric Visual Query 2D Localization

Aug 03, 2022
Mengmeng Xu, Cheng-Yang Fu, Yanghao Li, Bernard Ghanem, Juan-Manuel Perez-Rua, Tao Xiang

Figure 1 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 2 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 3 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Figure 4 for Negative Frames Matter in Egocentric Visual Query 2D Localization
Viaarxiv icon

SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021

Oct 06, 2021
Swathikiran Sudhakaran, Adrian Bulat, Juan-Manuel Perez-Rua, Alex Falcon, Sergio Escalera, Oswald Lanz, Brais Martinez, Georgios Tzimiropoulos

Figure 1 for SAIC_Cambridge-HuPBA-FBK Submission to the EPIC-Kitchens-100 Action Recognition Challenge 2021
Viaarxiv icon

TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification

Jun 21, 2021
Andrés Villa, Juan-Manuel Perez-Rua, Vladimir Araujo, Juan Carlos Niebles, Victor Escorcia, Alvaro Soto

Figure 1 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Figure 2 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Figure 3 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Figure 4 for TNT: Text-Conditioned Network with Transductive Inference for Few-Shot Video Classification
Viaarxiv icon

Space-time Mixing Attention for Video Transformer

Jun 11, 2021
Adrian Bulat, Juan-Manuel Perez-Rua, Swathikiran Sudhakaran, Brais Martinez, Georgios Tzimiropoulos

Figure 1 for Space-time Mixing Attention for Video Transformer
Figure 2 for Space-time Mixing Attention for Video Transformer
Figure 3 for Space-time Mixing Attention for Video Transformer
Figure 4 for Space-time Mixing Attention for Video Transformer
Viaarxiv icon