Picture for Spyros Gidaris

Spyros Gidaris

Boosting Visual Instruction Tuning with Self-Supervised Guidance

Add code
Apr 14, 2026
Viaarxiv icon

Representations Before Pixels: Semantics-Guided Hierarchical Video Prediction

Add code
Apr 13, 2026
Viaarxiv icon

Driving on Registers

Add code
Jan 08, 2026
Viaarxiv icon

Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

Add code
Jul 18, 2025
Figure 1 for Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
Figure 2 for Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
Figure 3 for Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
Figure 4 for Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning
Viaarxiv icon

Multi-Token Prediction Needs Registers

Add code
May 15, 2025
Viaarxiv icon

Boosting Generative Image Modeling via Joint Image-Feature Synthesis

Add code
Apr 22, 2025
Viaarxiv icon

VaViM and VaVAM: Autonomous Driving through Video Generative Modeling

Add code
Feb 21, 2025
Viaarxiv icon

EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling

Add code
Feb 13, 2025
Figure 1 for EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Figure 2 for EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Figure 3 for EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Figure 4 for EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Viaarxiv icon

Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers

Add code
Jan 14, 2025
Figure 1 for Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
Figure 2 for Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
Figure 3 for Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
Figure 4 for Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
Viaarxiv icon

DINO-Foresight Looking into the Future with DINO

Add code
Dec 16, 2024
Figure 1 for DINO-Foresight Looking into the Future with DINO
Figure 2 for DINO-Foresight Looking into the Future with DINO
Figure 3 for DINO-Foresight Looking into the Future with DINO
Figure 4 for DINO-Foresight Looking into the Future with DINO
Viaarxiv icon