Picture for Jin Gao

Jin Gao

SoLA-Vision: Fine-grained Layer-wise Linear Softmax Hybrid Attention

Add code
Jan 16, 2026
Viaarxiv icon

Integrating Diverse Assignment Strategies into DETRs

Add code
Jan 14, 2026
Viaarxiv icon

From Idea to Co-Creation: A Planner-Actor-Critic Framework for Agent Augmented 3D Modeling

Add code
Jan 08, 2026
Viaarxiv icon

Step-GUI Technical Report

Add code
Dec 19, 2025
Figure 1 for Step-GUI Technical Report
Figure 2 for Step-GUI Technical Report
Figure 3 for Step-GUI Technical Report
Figure 4 for Step-GUI Technical Report
Viaarxiv icon

Online Segment Any 3D Thing as Instance Tracking

Add code
Dec 08, 2025
Viaarxiv icon

PFSD: A Multi-Modal Pedestrian-Focus Scene Dataset for Rich Tasks in Semi-Structured Environments

Add code
Feb 24, 2025
Viaarxiv icon

MedForge: Building Medical Foundation Models Like Open Source Software Development

Add code
Feb 22, 2025
Figure 1 for MedForge: Building Medical Foundation Models Like Open Source Software Development
Figure 2 for MedForge: Building Medical Foundation Models Like Open Source Software Development
Figure 3 for MedForge: Building Medical Foundation Models Like Open Source Software Development
Figure 4 for MedForge: Building Medical Foundation Models Like Open Source Software Development
Viaarxiv icon

HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision

Add code
Nov 11, 2024
Figure 1 for HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision
Figure 2 for HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision
Figure 3 for HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision
Figure 4 for HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision
Viaarxiv icon

VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization

Add code
Nov 03, 2024
Figure 1 for VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
Figure 2 for VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
Figure 3 for VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
Figure 4 for VQ-Map: Bird's-Eye-View Map Layout Estimation in Tokenized Discrete Space via Vector Quantization
Viaarxiv icon

Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Add code
Aug 05, 2024
Figure 1 for Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Figure 2 for Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Figure 3 for Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Figure 4 for Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
Viaarxiv icon