Picture for Keze Wang

Keze Wang

PTTA: A Pure Text-to-Animation Framework for High-Quality Creation

Add code
Dec 21, 2025
Viaarxiv icon

Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models

Add code
Dec 20, 2025
Figure 1 for Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models
Figure 2 for Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models
Figure 3 for Adaptive-VoCo: Complexity-Aware Visual Token Compression for Vision-Language Models
Viaarxiv icon

STORM: Search-Guided Generative World Models for Robotic Manipulation

Add code
Dec 20, 2025
Figure 1 for STORM: Search-Guided Generative World Models for Robotic Manipulation
Figure 2 for STORM: Search-Guided Generative World Models for Robotic Manipulation
Figure 3 for STORM: Search-Guided Generative World Models for Robotic Manipulation
Figure 4 for STORM: Search-Guided Generative World Models for Robotic Manipulation
Viaarxiv icon

Large Language Models as Discounted Bayesian Filters

Add code
Dec 20, 2025
Viaarxiv icon

GTMA: Dynamic Representation Optimization for OOD Vision-Language Models

Add code
Dec 20, 2025
Viaarxiv icon

Massive Editing for Large Language Models Based on Dynamic Weight Generation

Add code
Dec 17, 2025
Viaarxiv icon

Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs

Add code
Dec 16, 2025
Figure 1 for Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
Figure 2 for Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
Figure 3 for Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
Figure 4 for Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs
Viaarxiv icon

HybridToken-VLM: Hybrid Token Compression for Vision-Language Models

Add code
Dec 09, 2025
Viaarxiv icon

MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models

Add code
Dec 09, 2025
Figure 1 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 2 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 3 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Figure 4 for MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models
Viaarxiv icon

Cost-Effective Communication: An Auction-based Method for Language Agent Interaction

Add code
Nov 17, 2025
Viaarxiv icon