Picture for Xiaoxuan Zhu

Xiaoxuan Zhu

Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior

Add code
Mar 20, 2024
Figure 1 for Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior
Figure 2 for Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior
Figure 3 for Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior
Figure 4 for Agent Group Chat: An Interactive Group Chat Simulacra For Better Eliciting Collective Emergent Behavior
Viaarxiv icon

Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)

Add code
Jul 11, 2023
Figure 1 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)
Figure 2 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)
Figure 3 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)
Figure 4 for Beyond the Obvious: Evaluating the Reasoning Ability In Real-life Scenarios of Language Models on Life Scapes Reasoning Benchmark~(LSR-Benchmark)
Viaarxiv icon

Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation

Add code
Jun 15, 2023
Figure 1 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Figure 2 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Figure 3 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Figure 4 for Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation
Viaarxiv icon

Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model--A Preliminary Release

Add code
Apr 23, 2023
Figure 1 for Domain Mastery Benchmark: An Ever-Updating Benchmark for Evaluating Holistic Domain Knowledge of Large Language Model--A Preliminary Release
Viaarxiv icon