Picture for Ying Shan

Ying Shan

DiCoDe: Diffusion-Compressed Deep Tokens for Autoregressive Video Generation with Language Models

Add code
Dec 05, 2024
Viaarxiv icon

Moto: Latent Motion Token as the Bridging Language for Robot Manipulation

Add code
Dec 05, 2024
Figure 1 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 2 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 3 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Figure 4 for Moto: Latent Motion Token as the Bridging Language for Robot Manipulation
Viaarxiv icon

Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Add code
Dec 05, 2024
Figure 1 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 2 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 3 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Figure 4 for Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Viaarxiv icon

NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images

Add code
Dec 04, 2024
Figure 1 for NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Figure 2 for NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Figure 3 for NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Figure 4 for NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Viaarxiv icon

Taming Scalable Visual Tokenizer for Autoregressive Image Generation

Add code
Dec 03, 2024
Figure 1 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Figure 2 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Figure 3 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Figure 4 for Taming Scalable Visual Tokenizer for Autoregressive Image Generation
Viaarxiv icon

DOGE: Towards Versatile Visual Document Grounding and Referring

Add code
Nov 26, 2024
Figure 1 for DOGE: Towards Versatile Visual Document Grounding and Referring
Figure 2 for DOGE: Towards Versatile Visual Document Grounding and Referring
Figure 3 for DOGE: Towards Versatile Visual Document Grounding and Referring
Figure 4 for DOGE: Towards Versatile Visual Document Grounding and Referring
Viaarxiv icon

NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model

Add code
Nov 25, 2024
Figure 1 for NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model
Figure 2 for NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model
Figure 3 for NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model
Figure 4 for NovelGS: Consistent Novel-view Denoising via Large Gaussian Reconstruction Model
Viaarxiv icon

mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA

Add code
Nov 22, 2024
Figure 1 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 2 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 3 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Figure 4 for mR$^2$AG: Multimodal Retrieval-Reflection-Augmented Generation for Knowledge-Based VQA
Viaarxiv icon

Taming Rectified Flow for Inversion and Editing

Add code
Nov 07, 2024
Figure 1 for Taming Rectified Flow for Inversion and Editing
Figure 2 for Taming Rectified Flow for Inversion and Editing
Figure 3 for Taming Rectified Flow for Inversion and Editing
Figure 4 for Taming Rectified Flow for Inversion and Editing
Viaarxiv icon

PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance

Add code
Nov 05, 2024
Figure 1 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Figure 2 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Figure 3 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Figure 4 for PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
Viaarxiv icon