Picture for Yongzhi Li

Yongzhi Li

AutoCut: End-to-end advertisement video editing based on multimodal discretization and controllable generation

Add code
Mar 30, 2026
Viaarxiv icon

Adaptive Video Distillation: Mitigating Oversaturation and Temporal Collapse in Few-Step Generation

Add code
Mar 23, 2026
Viaarxiv icon

MuSteerNet: Human Reaction Generation from Videos via Observation-Reaction Mutual Steering

Add code
Mar 20, 2026
Viaarxiv icon

Generative Recommendation for Large-Scale Advertising

Add code
Feb 26, 2026
Viaarxiv icon

IMAGINE: Integrating Multi-Agent System into One Model for Complex Reasoning and Planning

Add code
Oct 16, 2025
Figure 1 for IMAGINE: Integrating Multi-Agent System into One Model for Complex Reasoning and Planning
Figure 2 for IMAGINE: Integrating Multi-Agent System into One Model for Complex Reasoning and Planning
Figure 3 for IMAGINE: Integrating Multi-Agent System into One Model for Complex Reasoning and Planning
Figure 4 for IMAGINE: Integrating Multi-Agent System into One Model for Complex Reasoning and Planning
Viaarxiv icon

Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce

Add code
Apr 06, 2023
Figure 1 for Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
Figure 2 for Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
Figure 3 for Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
Figure 4 for Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
Viaarxiv icon

Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding

Add code
Sep 27, 2022
Figure 1 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Figure 2 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Figure 3 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Figure 4 for Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding
Viaarxiv icon

FORCE: A Framework of Rule-Based Conversational Recommender System

Add code
Mar 18, 2022
Figure 1 for FORCE: A Framework of Rule-Based Conversational Recommender System
Figure 2 for FORCE: A Framework of Rule-Based Conversational Recommender System
Viaarxiv icon

Integrating Pre-trained Model into Rule-based Dialogue Management

Add code
Feb 17, 2021
Figure 1 for Integrating Pre-trained Model into Rule-based Dialogue Management
Figure 2 for Integrating Pre-trained Model into Rule-based Dialogue Management
Figure 3 for Integrating Pre-trained Model into Rule-based Dialogue Management
Viaarxiv icon