Picture for Jing Shi

Jing Shi

More Than the Final Answer: Improving Visual Extraction and Logical Consistency in Vision-Language Models

Add code
Dec 13, 2025
Viaarxiv icon

Relational Visual Similarity

Add code
Dec 08, 2025
Figure 1 for Relational Visual Similarity
Figure 2 for Relational Visual Similarity
Figure 3 for Relational Visual Similarity
Figure 4 for Relational Visual Similarity
Viaarxiv icon

Plot'n Polish: Zero-shot Story Visualization and Disentangled Editing with Text-to-Image Diffusion Models

Add code
Sep 04, 2025
Viaarxiv icon

Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow

Add code
Jun 14, 2025
Figure 1 for Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow
Figure 2 for Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow
Figure 3 for Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow
Figure 4 for Trust-MARL: Trust-Based Multi-Agent Reinforcement Learning Framework for Cooperative On-Ramp Merging Control in Heterogeneous Traffic Flow
Viaarxiv icon

Give Me FP32 or Give Me Death? Challenges and Solutions for Reproducible Reasoning

Add code
Jun 11, 2025
Viaarxiv icon

YoChameleon: Personalized Vision and Language Generation

Add code
Apr 29, 2025
Viaarxiv icon

Accelerating Multi-Objective Collaborative Optimization of Doped Thermoelectric Materials via Artificial Intelligence

Add code
Apr 11, 2025
Viaarxiv icon

Visual Persona: Foundation Model for Full-Body Human Customization

Add code
Mar 19, 2025
Figure 1 for Visual Persona: Foundation Model for Full-Body Human Customization
Figure 2 for Visual Persona: Foundation Model for Full-Body Human Customization
Figure 3 for Visual Persona: Foundation Model for Full-Body Human Customization
Figure 4 for Visual Persona: Foundation Model for Full-Body Human Customization
Viaarxiv icon

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Add code
Jan 15, 2025
Figure 1 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 2 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 3 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Figure 4 for MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities
Viaarxiv icon

Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

Add code
Dec 24, 2024
Viaarxiv icon