Picture for Kun Yao

Kun Yao

AGPO: Asymmetric Group Policy Optimization for Verifiable Reasoning and Search Ads Relevance at JD

Add code
May 07, 2026
Viaarxiv icon

Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

Add code
Mar 02, 2026
Viaarxiv icon

Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models

Add code
Jan 03, 2025
Figure 1 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 2 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 3 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Figure 4 for Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models
Viaarxiv icon

Add-SD: Rational Generation without Manual Reference

Add code
Jul 30, 2024
Figure 1 for Add-SD: Rational Generation without Manual Reference
Figure 2 for Add-SD: Rational Generation without Manual Reference
Figure 3 for Add-SD: Rational Generation without Manual Reference
Figure 4 for Add-SD: Rational Generation without Manual Reference
Viaarxiv icon

OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer

Add code
Jul 15, 2024
Figure 1 for OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Figure 2 for OVLW-DETR: Open-Vocabulary Light-Weighted Detection Transformer
Viaarxiv icon

Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Add code
Jun 13, 2024
Figure 1 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
Figure 2 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
Figure 3 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
Figure 4 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting
Viaarxiv icon

LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection

Add code
Jun 05, 2024
Figure 1 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Figure 2 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Figure 3 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Figure 4 for LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Viaarxiv icon

StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond

Add code
Jun 04, 2024
Figure 1 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 2 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 3 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Figure 4 for StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Viaarxiv icon

Towards Unified Multi-granularity Text Detection with Interactive Attention

Add code
May 30, 2024
Figure 1 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 2 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 3 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Figure 4 for Towards Unified Multi-granularity Text Detection with Interactive Attention
Viaarxiv icon

FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition

Add code
Feb 05, 2024
Viaarxiv icon