Picture for Pengfei Yan

Pengfei Yan

University of Maryland

XRZero-G0: Pushing the Frontier of Dexterous Robotic Manipulation with Interfaces, Quality and Ratios

Add code
Apr 14, 2026
Viaarxiv icon

InstructTable: Improving Table Structure Recognition Through Instructions

Add code
Apr 03, 2026
Viaarxiv icon

PositionOCR: Augmenting Positional Awareness in Multi-Modal Models via Hybrid Specialist Integration

Add code
Feb 22, 2026
Viaarxiv icon

Empirical Comparison of Encoder-Based Language Models and Feature-Based Supervised Machine Learning Approaches to Automated Scoring of Long Essays

Add code
Jan 07, 2026
Viaarxiv icon

UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models

Add code
Dec 12, 2025
Figure 1 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 2 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 3 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Figure 4 for UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models
Viaarxiv icon

MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation

Add code
Nov 30, 2024
Figure 1 for MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
Figure 2 for MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
Figure 3 for MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
Figure 4 for MotionCharacter: Identity-Preserving and Motion Controllable Human Video Generation
Viaarxiv icon

Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis

Add code
Sep 10, 2024
Figure 1 for Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Figure 2 for Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Figure 3 for Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Figure 4 for Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Viaarxiv icon

AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning

Add code
May 22, 2024
Figure 1 for AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning
Figure 2 for AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning
Figure 3 for AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning
Figure 4 for AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning
Viaarxiv icon

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Add code
Apr 12, 2024
Figure 1 for Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
Figure 2 for Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
Figure 3 for Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
Figure 4 for Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
Viaarxiv icon

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Add code
Mar 01, 2024
Figure 1 for CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Figure 2 for CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Figure 3 for CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Figure 4 for CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Viaarxiv icon