Picture for Zhihan Zhang

Zhihan Zhang

Stabilizer bootstrapping: A recipe for efficient agnostic tomography and magic estimation

Add code
Aug 13, 2024
Viaarxiv icon

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Add code
Jun 26, 2024
Figure 1 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 2 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 3 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Figure 4 for BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Viaarxiv icon

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Add code
Jun 17, 2024
Figure 1 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Figure 2 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Figure 3 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Figure 4 for Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Viaarxiv icon

Hallucination Mitigation Prompts Long-term Video Understanding

Add code
Jun 17, 2024
Figure 1 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 2 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 3 for Hallucination Mitigation Prompts Long-term Video Understanding
Figure 4 for Hallucination Mitigation Prompts Long-term Video Understanding
Viaarxiv icon

Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding

Add code
Jun 04, 2024
Viaarxiv icon

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

Add code
May 29, 2024
Figure 1 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Figure 2 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Figure 3 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Figure 4 for MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
Viaarxiv icon

Large Language Models Can Self-Correct with Minimal Effort

Add code
May 23, 2024
Figure 1 for Large Language Models Can Self-Correct with Minimal Effort
Figure 2 for Large Language Models Can Self-Correct with Minimal Effort
Figure 3 for Large Language Models Can Self-Correct with Minimal Effort
Figure 4 for Large Language Models Can Self-Correct with Minimal Effort
Viaarxiv icon

Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises

Add code
May 01, 2024
Figure 1 for Quantum-Classical Separations in Shallow-Circuit-Based Learning with and without Noises
Viaarxiv icon

Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training

Add code
Apr 26, 2024
Figure 1 for Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
Figure 2 for Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
Figure 3 for Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
Figure 4 for Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training
Viaarxiv icon

LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems

Add code
Mar 14, 2024
Figure 1 for LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems
Figure 2 for LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems
Figure 3 for LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems
Figure 4 for LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems
Viaarxiv icon