Picture for Sucheng Ren

Sucheng Ren

What If We Recaption Billions of Web Images with LLaMA-3?

Add code
Jun 12, 2024
Figure 1 for What If We Recaption Billions of Web Images with LLaMA-3?
Figure 2 for What If We Recaption Billions of Web Images with LLaMA-3?
Figure 3 for What If We Recaption Billions of Web Images with LLaMA-3?
Figure 4 for What If We Recaption Billions of Web Images with LLaMA-3?
Viaarxiv icon

Autoregressive Pretraining with Mamba in Vision

Add code
Jun 11, 2024
Viaarxiv icon

Medical Vision Generalist: Unifying Medical Imaging Tasks in Context

Add code
Jun 08, 2024
Viaarxiv icon

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

Add code
May 24, 2024
Viaarxiv icon

Mamba-R: Vision Mamba ALSO Needs Registers

Add code
May 23, 2024
Figure 1 for Mamba-R: Vision Mamba ALSO Needs Registers
Figure 2 for Mamba-R: Vision Mamba ALSO Needs Registers
Figure 3 for Mamba-R: Vision Mamba ALSO Needs Registers
Figure 4 for Mamba-R: Vision Mamba ALSO Needs Registers
Viaarxiv icon

Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapolation

Add code
Mar 11, 2024
Figure 1 for Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapolation
Figure 2 for Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapolation
Figure 3 for Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapolation
Figure 4 for Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapolation
Viaarxiv icon

Compress & Align: Curating Image-Text Data with Human Knowledge

Add code
Dec 13, 2023
Viaarxiv icon

Rejuvenating image-GPT as Strong Visual Representation Learners

Add code
Dec 04, 2023
Figure 1 for Rejuvenating image-GPT as Strong Visual Representation Learners
Figure 2 for Rejuvenating image-GPT as Strong Visual Representation Learners
Figure 3 for Rejuvenating image-GPT as Strong Visual Representation Learners
Figure 4 for Rejuvenating image-GPT as Strong Visual Representation Learners
Viaarxiv icon

SG-Former: Self-guided Transformer with Evolving Token Reallocation

Add code
Aug 23, 2023
Figure 1 for SG-Former: Self-guided Transformer with Evolving Token Reallocation
Figure 2 for SG-Former: Self-guided Transformer with Evolving Token Reallocation
Figure 3 for SG-Former: Self-guided Transformer with Evolving Token Reallocation
Figure 4 for SG-Former: Self-guided Transformer with Evolving Token Reallocation
Viaarxiv icon

NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos

Add code
Aug 23, 2023
Figure 1 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Figure 2 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Figure 3 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Figure 4 for NPF-200: A Multi-Modal Eye Fixation Dataset and Method for Non-Photorealistic Videos
Viaarxiv icon