Picture for Qiuna Tan

Qiuna Tan

We-Math 2.0: A Versatile MathBook System for Incentivizing Visual Mathematical Reasoning

Add code
Aug 14, 2025
Viaarxiv icon

Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models

Add code
Dec 17, 2024
Figure 1 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 2 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 3 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Figure 4 for Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models
Viaarxiv icon

We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?

Add code
Jul 01, 2024
Figure 1 for We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Figure 2 for We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Figure 3 for We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Figure 4 for We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning?
Viaarxiv icon

CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery

Add code
Jun 12, 2024
Figure 1 for CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Figure 2 for CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Figure 3 for CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Figure 4 for CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery
Viaarxiv icon