Picture for Haonan Li

Haonan Li

Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

Add code
Jun 28, 2024
Figure 1 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 2 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 3 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Figure 4 for Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs
Viaarxiv icon

Lessons from the Trenches on Reproducible Evaluation of Language Models

Add code
May 23, 2024
Figure 1 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Figure 2 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Figure 3 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Figure 4 for Lessons from the Trenches on Reproducible Evaluation of Language Models
Viaarxiv icon

3D Hand Mesh Recovery from Monocular RGB in Camera Space

Add code
May 12, 2024
Viaarxiv icon

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

Add code
Mar 31, 2024
Figure 1 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Figure 2 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Figure 3 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Figure 4 for Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
Viaarxiv icon

EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models

Add code
Mar 15, 2024
Figure 1 for EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Figure 2 for EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Figure 3 for EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Figure 4 for EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models
Viaarxiv icon

Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification

Add code
Mar 07, 2024
Figure 1 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Figure 2 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Figure 3 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Figure 4 for Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
Viaarxiv icon

ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

Add code
Feb 20, 2024
Figure 1 for ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Figure 2 for ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Figure 3 for ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Figure 4 for ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic
Viaarxiv icon

A Chinese Dataset for Evaluating the Safeguards in Large Language Models

Add code
Feb 19, 2024
Figure 1 for A Chinese Dataset for Evaluating the Safeguards in Large Language Models
Figure 2 for A Chinese Dataset for Evaluating the Safeguards in Large Language Models
Figure 3 for A Chinese Dataset for Evaluating the Safeguards in Large Language Models
Figure 4 for A Chinese Dataset for Evaluating the Safeguards in Large Language Models
Viaarxiv icon

Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents

Add code
Feb 18, 2024
Figure 1 for Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Figure 2 for Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Figure 3 for Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Figure 4 for Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Viaarxiv icon

Location Aware Modular Biencoder for Tourism Question Answering

Add code
Jan 04, 2024
Figure 1 for Location Aware Modular Biencoder for Tourism Question Answering
Figure 2 for Location Aware Modular Biencoder for Tourism Question Answering
Figure 3 for Location Aware Modular Biencoder for Tourism Question Answering
Figure 4 for Location Aware Modular Biencoder for Tourism Question Answering
Viaarxiv icon