Picture for Huan Sun

Huan Sun

A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents

Add code
Feb 15, 2024
Figure 1 for A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Figure 2 for A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Figure 3 for A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents
Viaarxiv icon

eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data

Add code
Feb 13, 2024
Figure 1 for eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
Figure 2 for eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
Figure 3 for eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
Figure 4 for eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
Viaarxiv icon

GPT-4V is a Generalist Web Agent, if Grounded

Add code
Jan 03, 2024
Viaarxiv icon

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Add code
Nov 27, 2023
Figure 1 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 2 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 3 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Figure 4 for MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Viaarxiv icon

How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities

Add code
Nov 15, 2023
Figure 1 for How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Figure 2 for How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Figure 3 for How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Figure 4 for How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Viaarxiv icon

TableLlama: Towards Open Large Generalist Models for Tables

Add code
Nov 15, 2023
Viaarxiv icon

MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning

Add code
Oct 03, 2023
Viaarxiv icon

AgentBench: Evaluating LLMs as Agents

Add code
Aug 07, 2023
Figure 1 for AgentBench: Evaluating LLMs as Agents
Figure 2 for AgentBench: Evaluating LLMs as Agents
Figure 3 for AgentBench: Evaluating LLMs as Agents
Figure 4 for AgentBench: Evaluating LLMs as Agents
Viaarxiv icon

Roll Up Your Sleeves: Working with a Collaborative and Engaging Task-Oriented Dialogue System

Add code
Jul 29, 2023
Viaarxiv icon

Biomedical Language Models are Robust to Sub-optimal Tokenization

Add code
Jul 10, 2023
Figure 1 for Biomedical Language Models are Robust to Sub-optimal Tokenization
Figure 2 for Biomedical Language Models are Robust to Sub-optimal Tokenization
Figure 3 for Biomedical Language Models are Robust to Sub-optimal Tokenization
Figure 4 for Biomedical Language Models are Robust to Sub-optimal Tokenization
Viaarxiv icon