Picture for Arman Cohan

Arman Cohan

Investigating Generalization of One-shot LLM Steering Vectors

Add code
Feb 26, 2025
Figure 1 for Investigating Generalization of One-shot LLM Steering Vectors
Figure 2 for Investigating Generalization of One-shot LLM Steering Vectors
Figure 3 for Investigating Generalization of One-shot LLM Steering Vectors
Figure 4 for Investigating Generalization of One-shot LLM Steering Vectors
Viaarxiv icon

ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models

Add code
Feb 24, 2025
Figure 1 for ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Figure 2 for ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Figure 3 for ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Figure 4 for ATEB: Evaluating and Improving Advanced NLP Tasks for Text Embedding Models
Viaarxiv icon

TESS 2: A Large-Scale Generalist Diffusion Language Model

Add code
Feb 19, 2025
Figure 1 for TESS 2: A Large-Scale Generalist Diffusion Language Model
Figure 2 for TESS 2: A Large-Scale Generalist Diffusion Language Model
Figure 3 for TESS 2: A Large-Scale Generalist Diffusion Language Model
Figure 4 for TESS 2: A Large-Scale Generalist Diffusion Language Model
Viaarxiv icon

mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval

Add code
Jan 31, 2025
Figure 1 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 2 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 3 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Figure 4 for mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval
Viaarxiv icon

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Add code
Jan 21, 2025
Figure 1 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 2 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 3 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Figure 4 for MMVU: Measuring Expert-Level Multi-Discipline Video Understanding
Viaarxiv icon

ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning

Add code
Jan 11, 2025
Figure 1 for ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Figure 2 for ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Figure 3 for ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Figure 4 for ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning
Viaarxiv icon

Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference

Add code
Dec 31, 2024
Figure 1 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Figure 2 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Figure 3 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Figure 4 for Re-evaluating Automatic LLM System Ranking for Alignment with Human Preference
Viaarxiv icon

HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation

Add code
Dec 30, 2024
Viaarxiv icon

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain

Add code
Nov 23, 2024
Figure 1 for ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain
Figure 2 for ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain
Figure 3 for ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain
Figure 4 for ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain
Viaarxiv icon

SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers

Add code
Nov 08, 2024
Figure 1 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Figure 2 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Figure 3 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Figure 4 for SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers
Viaarxiv icon