Picture for Yunzhong He

Yunzhong He

Audio MultiChallenge: A Multi-Turn Evaluation of Spoken Dialogue Systems on Natural Human Interaction

Add code
Dec 16, 2025
Viaarxiv icon

PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning

Add code
Nov 14, 2025
Viaarxiv icon

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

Add code
Oct 14, 2025
Viaarxiv icon

Online Rubrics Elicitation from Pairwise Comparisons

Add code
Oct 08, 2025
Figure 1 for Online Rubrics Elicitation from Pairwise Comparisons
Figure 2 for Online Rubrics Elicitation from Pairwise Comparisons
Figure 3 for Online Rubrics Elicitation from Pairwise Comparisons
Figure 4 for Online Rubrics Elicitation from Pairwise Comparisons
Viaarxiv icon

Auto-GPT for Online Decision Making: Benchmarks and Additional Opinions

Add code
Jun 04, 2023
Viaarxiv icon

HierCat: Hierarchical Query Categorization from Weakly Supervised Data at Facebook Marketplace

Add code
Feb 22, 2023
Viaarxiv icon

Que2Engage: Embedding-based Retrieval for Relevant and Engaging Products at Facebook Marketplace

Add code
Feb 21, 2023
Viaarxiv icon

A Social Search Model for Large Scale Social Networks

Add code
May 09, 2020
Figure 1 for A Social Search Model for Large Scale Social Networks
Figure 2 for A Social Search Model for Large Scale Social Networks
Figure 3 for A Social Search Model for Large Scale Social Networks
Figure 4 for A Social Search Model for Large Scale Social Networks
Viaarxiv icon