Picture for Shuo Chen

Shuo Chen

DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

Add code
Nov 07, 2024
Viaarxiv icon

Novel Object Synthesis via Adaptive Text-Image Harmony

Add code
Oct 28, 2024
Viaarxiv icon

BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Events

Add code
Oct 27, 2024
Viaarxiv icon

ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing

Add code
Oct 18, 2024
Viaarxiv icon

Visual Question Decomposition on Multimodal Large Language Models

Add code
Sep 28, 2024
Figure 1 for Visual Question Decomposition on Multimodal Large Language Models
Figure 2 for Visual Question Decomposition on Multimodal Large Language Models
Figure 3 for Visual Question Decomposition on Multimodal Large Language Models
Figure 4 for Visual Question Decomposition on Multimodal Large Language Models
Viaarxiv icon

BlinkTrack: Feature Tracking over 100 FPS via Events and Images

Add code
Sep 26, 2024
Viaarxiv icon

Can We Count on LLMs? The Fixed-Effect Fallacy and Claims of GPT-4 Capabilities

Add code
Sep 11, 2024
Viaarxiv icon

Revisiting Prompt Pretraining of Vision-Language Models

Add code
Sep 10, 2024
Viaarxiv icon

Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding

Add code
Jul 12, 2024
Viaarxiv icon

Robust Learning under Hybrid Noise

Add code
Jul 04, 2024
Viaarxiv icon