Alert button
Picture for Yilun Zhao

Yilun Zhao

Alert button

Evaluating LLMs at Detecting Errors in LLM Responses

Add code
Bookmark button
Alert button
Apr 04, 2024
Ryo Kamoi, Sarkar Snigdha Sarathi Das, Renze Lou, Jihyun Janice Ahn, Yilun Zhao, Xiaoxin Lu, Nan Zhang, Yusen Zhang, Ranran Haoran Zhang, Sujeeth Reddy Vummanthala, Salika Dave, Shaobo Qin, Arman Cohan, Wenpeng Yin, Rui Zhang

Viaarxiv icon

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise

Add code
Bookmark button
Alert button
Apr 03, 2024
Chunyuan Deng, Xiangru Tang, Yilun Zhao, Hanming Wang, Haoran Wang, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

Viaarxiv icon

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

Add code
Bookmark button
Alert button
Feb 07, 2024
Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

Viaarxiv icon

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

Add code
Bookmark button
Alert button
Feb 05, 2024
Zhiyuan Hu, Chumin Liu, Xidong Feng, Yilun Zhao, See-Kiong Ng, Anh Tuan Luu, Junxian He, Pang Wei Koh, Bryan Hooi

Viaarxiv icon

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks

Add code
Bookmark button
Alert button
Nov 16, 2023
Yuliang Liu, Xiangru Tang, Zefan Cai, Junjie Lu, Yichi Zhang, Yanjun Shao, Zexuan Deng, Helan Hu, Zengxian Yang, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Zhengliang Li, Liang Chen, Yiming Zong, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein

Viaarxiv icon

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

Add code
Bookmark button
Alert button
Nov 16, 2023
Xiangru Tang, Anni Zou, Zhuosheng Zhang, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein

Figure 1 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Figure 2 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Figure 3 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Figure 4 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Viaarxiv icon

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data

Add code
Bookmark button
Alert button
Nov 16, 2023
Yilun Zhao, Yitao Long, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan

Viaarxiv icon

KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains

Add code
Bookmark button
Alert button
Nov 16, 2023
Yilun Zhao, Hongjun Liu, Yitao Long, Rui Zhang, Chen Zhao, Arman Cohan

Viaarxiv icon

Investigating Data Contamination in Modern Benchmarks for Large Language Models

Add code
Bookmark button
Alert button
Nov 16, 2023
Chunyuan Deng, Yilun Zhao, Xiangru Tang, Mark Gerstein, Arman Cohan

Viaarxiv icon

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Add code
Bookmark button
Alert button
Nov 16, 2023
Linyong Nan, Ellen Zhang, Weijin Zou, Yilun Zhao, Wenfei Zhou, Arman Cohan

Viaarxiv icon