Picture for Yu Su

Yu Su

Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models

Add code
Feb 10, 2025
Viaarxiv icon

Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation

Add code
Jan 20, 2025
Figure 1 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 2 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 3 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 4 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Viaarxiv icon

Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis

Add code
Jan 16, 2025
Figure 1 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 2 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 3 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 4 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Viaarxiv icon

Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation

Add code
Jan 12, 2025
Viaarxiv icon

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

Add code
Nov 25, 2024
Viaarxiv icon

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Add code
Nov 10, 2024
Figure 1 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 2 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 3 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 4 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Viaarxiv icon

Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect

Add code
Nov 08, 2024
Figure 1 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 2 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 3 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 4 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Viaarxiv icon

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Add code
Oct 07, 2024
Figure 1 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 2 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 3 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 4 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Viaarxiv icon

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Add code
Oct 07, 2024
Figure 1 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 2 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 3 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 4 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Viaarxiv icon

Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers

Add code
Oct 03, 2024
Viaarxiv icon