Picture for Gang Xiong

Gang Xiong

Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning

Add code
Apr 21, 2025
Viaarxiv icon

Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval

Add code
Mar 21, 2025
Viaarxiv icon

MADS: Multi-Attribute Document Supervision for Zero-Shot Image Classification

Add code
Mar 10, 2025
Viaarxiv icon

ProAPO: Progressively Automatic Prompt Optimization for Visual Classification

Add code
Feb 27, 2025
Viaarxiv icon

Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval

Add code
Dec 15, 2024
Viaarxiv icon

Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval

Add code
Oct 22, 2024
Figure 1 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Figure 2 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Figure 3 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Figure 4 for Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image Retrieval
Viaarxiv icon

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

Add code
Oct 01, 2024
Figure 1 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 2 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 3 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Figure 4 for Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
Viaarxiv icon

T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval

Add code
Aug 21, 2024
Figure 1 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 2 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 3 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Figure 4 for T2VIndexer: A Generative Video Indexer for Efficient Text-Video Retrieval
Viaarxiv icon

IIU: Independent Inference Units for Knowledge-based Visual Question Answering

Add code
Aug 15, 2024
Viaarxiv icon

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Add code
Jul 23, 2024
Viaarxiv icon