Picture for Yuanjun Xiong

Yuanjun Xiong

MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs

Add code
Jun 17, 2024
Figure 1 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 2 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 3 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Figure 4 for MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs
Viaarxiv icon

RetinaGS: Scalable Training for Dense Scene Rendering with Billion-Scale 3D Gaussians

Add code
Jun 17, 2024
Viaarxiv icon

Image and Video Tokenization with Binary Spherical Quantization

Add code
Jun 11, 2024
Figure 1 for Image and Video Tokenization with Binary Spherical Quantization
Figure 2 for Image and Video Tokenization with Binary Spherical Quantization
Figure 3 for Image and Video Tokenization with Binary Spherical Quantization
Figure 4 for Image and Video Tokenization with Binary Spherical Quantization
Viaarxiv icon

Bootstrap3D: Improving 3D Content Creation with Synthetic Data

Add code
May 31, 2024
Figure 1 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Figure 2 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Figure 3 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Figure 4 for Bootstrap3D: Improving 3D Content Creation with Synthetic Data
Viaarxiv icon

A Full-duplex Speech Dialogue Scheme Based On Large Language Models

Add code
May 29, 2024
Figure 1 for A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Figure 2 for A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Figure 3 for A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Figure 4 for A Full-duplex Speech Dialogue Scheme Based On Large Language Models
Viaarxiv icon

RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition

Add code
Mar 20, 2024
Figure 1 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Figure 2 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Figure 3 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Figure 4 for RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition
Viaarxiv icon

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Add code
Dec 13, 2023
Figure 1 for Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Figure 2 for Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Figure 3 for Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Figure 4 for Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Viaarxiv icon

Towards Regression-Free Neural Networks for Diverse Compute Platforms

Add code
Sep 27, 2022
Figure 1 for Towards Regression-Free Neural Networks for Diverse Compute Platforms
Figure 2 for Towards Regression-Free Neural Networks for Diverse Compute Platforms
Figure 3 for Towards Regression-Free Neural Networks for Diverse Compute Platforms
Figure 4 for Towards Regression-Free Neural Networks for Diverse Compute Platforms
Viaarxiv icon

Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks

Add code
Sep 20, 2022
Figure 1 for Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks
Figure 2 for Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks
Figure 3 for Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks
Figure 4 for Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks
Viaarxiv icon

ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training

Add code
May 13, 2022
Figure 1 for ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
Figure 2 for ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
Figure 3 for ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
Figure 4 for ELODI: Ensemble Logit Difference Inhibition for Positive-Congruent Training
Viaarxiv icon