Picture for Wentao Liu

Wentao Liu

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Add code
Aug 08, 2025
Viaarxiv icon

Cultivating Helpful, Personalized, and Creative AI Tutors: A Framework for Pedagogical Alignment using Reinforcement Learning

Add code
Jul 27, 2025
Viaarxiv icon

Harmonizing Visual Representations for Unified Multimodal Understanding and Generation

Add code
Mar 27, 2025
Figure 1 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Figure 2 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Figure 3 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Figure 4 for Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Viaarxiv icon

NADER: Neural Architecture Design via Multi-Agent Collaboration

Add code
Dec 26, 2024
Figure 1 for NADER: Neural Architecture Design via Multi-Agent Collaboration
Figure 2 for NADER: Neural Architecture Design via Multi-Agent Collaboration
Figure 3 for NADER: Neural Architecture Design via Multi-Agent Collaboration
Figure 4 for NADER: Neural Architecture Design via Multi-Agent Collaboration
Viaarxiv icon

ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries

Add code
Dec 17, 2024
Figure 1 for ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
Figure 2 for ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
Figure 3 for ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
Figure 4 for ShotVL: Human-Centric Highlight Frame Retrieval via Language Queries
Viaarxiv icon

KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension

Add code
Nov 04, 2024
Figure 1 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 2 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 3 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Figure 4 for KptLLM: Unveiling the Power of Large Language Model for Keypoint Comprehension
Viaarxiv icon

PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging

Add code
Sep 26, 2024
Figure 1 for PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
Figure 2 for PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
Figure 3 for PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
Figure 4 for PhoCoLens: Photorealistic and Consistent Reconstruction in Lensless Imaging
Viaarxiv icon

CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models

Add code
Sep 04, 2024
Figure 1 for CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
Figure 2 for CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
Figure 3 for CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
Figure 4 for CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models
Viaarxiv icon

CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications

Add code
Aug 07, 2024
Figure 1 for CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Figure 2 for CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Figure 3 for CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Figure 4 for CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Viaarxiv icon

TCFormer: Visual Recognition via Token Clustering Transformer

Add code
Jul 16, 2024
Viaarxiv icon