Picture for Ziyong Feng

Ziyong Feng

Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs

Add code
Apr 24, 2025
Viaarxiv icon

Decoupled Global-Local Alignment for Improving Compositional Understanding

Add code
Apr 23, 2025
Viaarxiv icon

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

Add code
Mar 19, 2025
Viaarxiv icon

Mocap-2-to-3: Lifting 2D Diffusion-Based Pretrained Models for 3D Motion Capture

Add code
Mar 05, 2025
Viaarxiv icon

RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm

Add code
Feb 18, 2025
Viaarxiv icon

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

Add code
Nov 20, 2024
Viaarxiv icon

Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension

Add code
Oct 18, 2024
Figure 1 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Figure 2 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Figure 3 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Figure 4 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Viaarxiv icon

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Add code
Aug 18, 2024
Figure 1 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Figure 2 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Figure 3 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Figure 4 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Viaarxiv icon

VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling

Add code
Aug 02, 2024
Viaarxiv icon

Multi-label Cluster Discrimination for Visual Representation Learning

Add code
Jul 24, 2024
Figure 1 for Multi-label Cluster Discrimination for Visual Representation Learning
Figure 2 for Multi-label Cluster Discrimination for Visual Representation Learning
Figure 3 for Multi-label Cluster Discrimination for Visual Representation Learning
Figure 4 for Multi-label Cluster Discrimination for Visual Representation Learning
Viaarxiv icon