Picture for Hongming Zhang

Hongming Zhang

Shammie

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Add code
Oct 02, 2024
Viaarxiv icon

Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots

Add code
Sep 16, 2024
Figure 1 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 2 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 3 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Figure 4 for Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots
Viaarxiv icon

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

Add code
Sep 12, 2024
Figure 1 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 2 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 3 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Figure 4 for DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?
Viaarxiv icon

$\textit{GeoHard}$: Towards Measuring Class-wise Hardness through Modelling Class Semantics

Add code
Jul 17, 2024
Viaarxiv icon

$\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity

Add code
Jul 15, 2024
Figure 1 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity
Figure 2 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity
Figure 3 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity
Figure 4 for $\texttt{MixGR}$: Enhancing Retriever Generalization for Scientific Domain through Complementary Granularity
Viaarxiv icon

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

Add code
Jul 15, 2024
Figure 1 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Figure 2 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Figure 3 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Figure 4 for DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems
Viaarxiv icon

Abstraction-of-Thought Makes Language Models Better Reasoners

Add code
Jun 18, 2024
Figure 1 for Abstraction-of-Thought Makes Language Models Better Reasoners
Figure 2 for Abstraction-of-Thought Makes Language Models Better Reasoners
Figure 3 for Abstraction-of-Thought Makes Language Models Better Reasoners
Figure 4 for Abstraction-of-Thought Makes Language Models Better Reasoners
Viaarxiv icon

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

Add code
May 04, 2024
Figure 1 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Figure 2 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Figure 3 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Figure 4 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Viaarxiv icon

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding

Add code
Apr 21, 2024
Viaarxiv icon

Conceptual and Unbiased Reasoning in Language Models

Add code
Mar 30, 2024
Viaarxiv icon