Picture for Peng Hou

Peng Hou

SEA-Vision: A Multilingual Benchmark for Comprehensive Document and Scene Text Understanding in Southeast Asia

Add code
Mar 16, 2026
Viaarxiv icon

CCCaption: Dual-Reward Reinforcement Learning for Complete and Correct Image Captioning

Add code
Feb 25, 2026
Viaarxiv icon

Towards On-Policy SFT: Distribution Discriminant Theory and its Applications in LLM Training

Add code
Feb 12, 2026
Viaarxiv icon

Annotation-Free Curb Detection Leveraging Altitude Difference Image

Add code
Sep 30, 2024
Figure 1 for Annotation-Free Curb Detection Leveraging Altitude Difference Image
Figure 2 for Annotation-Free Curb Detection Leveraging Altitude Difference Image
Figure 3 for Annotation-Free Curb Detection Leveraging Altitude Difference Image
Figure 4 for Annotation-Free Curb Detection Leveraging Altitude Difference Image
Viaarxiv icon

Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce

Add code
Nov 29, 2023
Figure 1 for Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce
Figure 2 for Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce
Figure 3 for Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce
Figure 4 for Transformer-empowered Multi-modal Item Embedding for Enhanced Image Search in E-Commerce
Viaarxiv icon

Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers

Add code
Nov 26, 2019
Figure 1 for Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
Figure 2 for Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
Figure 3 for Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
Figure 4 for Hearing Lips: Improving Lip Reading by Distilling Speech Recognizers
Viaarxiv icon