Picture for Xiaojun Wan

Xiaojun Wan

Themis: Towards Flexible and Interpretable NLG Evaluation

Add code
Jun 26, 2024
Viaarxiv icon

PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models

Add code
Jun 26, 2024
Figure 1 for PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Figure 2 for PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Figure 3 for PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Figure 4 for PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models
Viaarxiv icon

MC-MKE: A Fine-Grained Multimodal Knowledge Editing Benchmark Emphasizing Modality Consistency

Add code
Jun 19, 2024
Viaarxiv icon

ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions

Add code
Jun 13, 2024
Figure 1 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Figure 2 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Figure 3 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Figure 4 for ContraSolver: Self-Alignment of Language Models by Resolving Internal Preference Contradictions
Viaarxiv icon

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

Add code
Jun 12, 2024
Viaarxiv icon

Defining and Detecting Vulnerability in Human Evaluation Guidelines: A Preliminary Study Towards Reliable NLG Evaluation

Add code
Jun 12, 2024
Viaarxiv icon

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

Add code
May 22, 2024
Figure 1 for WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness
Figure 2 for WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness
Figure 3 for WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness
Figure 4 for WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness
Viaarxiv icon

Automated Similarity Metric Generation for Recommendation

Add code
Apr 18, 2024
Figure 1 for Automated Similarity Metric Generation for Recommendation
Figure 2 for Automated Similarity Metric Generation for Recommendation
Figure 3 for Automated Similarity Metric Generation for Recommendation
Figure 4 for Automated Similarity Metric Generation for Recommendation
Viaarxiv icon

WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction

Add code
Mar 05, 2024
Figure 1 for WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction
Figure 2 for WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction
Figure 3 for WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction
Figure 4 for WikiTableEdit: A Benchmark for Table Editing by Natural Language Instruction
Viaarxiv icon

Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective

Add code
Mar 03, 2024
Figure 1 for Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective
Figure 2 for Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective
Figure 3 for Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective
Figure 4 for Evaluating and Mitigating Number Hallucinations in Large Vision-Language Models: A Consistency Perspective
Viaarxiv icon