Picture for Jimin Huang

Jimin Huang

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Add code
Mar 24, 2026
Viaarxiv icon

Conv-FinRe: A Conversational and Longitudinal Benchmark for Utility-Grounded Financial Recommendation

Add code
Feb 19, 2026
Viaarxiv icon

The CLEF-2026 FinMMEval Lab: Multilingual and Multimodal Evaluation of Financial AI Systems

Add code
Feb 11, 2026
Viaarxiv icon

Ebisu: Benchmarking Large Language Models in Japanese Finance

Add code
Feb 01, 2026
Viaarxiv icon

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Add code
Jan 08, 2026
Viaarxiv icon

The Illusion of Specialization: Unveiling the Domain-Invariant "Standing Committee" in Mixture-of-Experts Models

Add code
Jan 06, 2026
Viaarxiv icon

FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation

Add code
Nov 19, 2025
Viaarxiv icon

Memorization in Large Language Models in Medicine: Prevalence, Characteristics, and Implications

Add code
Sep 10, 2025
Viaarxiv icon

A Retrieval-Augmented Multi-Agent Framework for Psychiatry Diagnosis

Add code
Jun 04, 2025
Viaarxiv icon

MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs

Add code
May 30, 2025
Figure 1 for MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs
Figure 2 for MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs
Figure 3 for MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs
Figure 4 for MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs
Viaarxiv icon