Alert button
Picture for Yilun Zhao

Yilun Zhao

Alert button

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

Feb 07, 2024
Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

Viaarxiv icon

Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

Feb 05, 2024
Zhiyuan Hu, Chumin Liu, Xidong Feng, Yilun Zhao, See-Kiong Ng, Anh Tuan Luu, Junxian He, Pang Wei Koh, Bryan Hooi

Viaarxiv icon

ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks

Nov 16, 2023
Yuliang Liu, Xiangru Tang, Zefan Cai, Junjie Lu, Yichi Zhang, Yanjun Shao, Zexuan Deng, Helan Hu, Zengxian Yang, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Zhengliang Li, Liang Chen, Yiming Zong, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein

Viaarxiv icon

MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning

Nov 16, 2023
Xiangru Tang, Anni Zou, Zhuosheng Zhang, Yilun Zhao, Xingyao Zhang, Arman Cohan, Mark Gerstein

Figure 1 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Figure 2 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Figure 3 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Figure 4 for MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Viaarxiv icon

DocMath-Eval: Evaluating Numerical Reasoning Capabilities of LLMs in Understanding Long Documents with Tabular Data

Nov 16, 2023
Yilun Zhao, Yitao Long, Hongjun Liu, Linyong Nan, Lyuhao Chen, Ryo Kamoi, Yixin Liu, Xiangru Tang, Rui Zhang, Arman Cohan

Viaarxiv icon

KnowledgeMath: Knowledge-Intensive Math Word Problem Solving in Finance Domains

Nov 16, 2023
Yilun Zhao, Hongjun Liu, Yitao Long, Rui Zhang, Chen Zhao, Arman Cohan

Viaarxiv icon

Investigating Data Contamination in Modern Benchmarks for Large Language Models

Nov 16, 2023
Chunyuan Deng, Yilun Zhao, Xiangru Tang, Mark Gerstein, Arman Cohan

Viaarxiv icon

On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Nov 16, 2023
Linyong Nan, Ellen Zhang, Weijin Zou, Yilun Zhao, Wenfei Zhou, Arman Cohan

Viaarxiv icon

Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

Nov 15, 2023
Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, Pengfei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan

Viaarxiv icon

L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

Oct 02, 2023
Ansong Ni, Pengcheng Yin, Yilun Zhao, Martin Riddell, Troy Feng, Rui Shen, Stephen Yin, Ye Liu, Semih Yavuz, Caiming Xiong, Shafiq Joty, Yingbo Zhou, Dragomir Radev, Arman Cohan

Figure 1 for L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
Figure 2 for L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
Figure 3 for L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
Figure 4 for L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models
Viaarxiv icon