Picture for Tongshuang Wu

Tongshuang Wu

SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning

Add code
Jul 16, 2024
Figure 1 for SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Figure 2 for SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Figure 3 for SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Figure 4 for SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning
Viaarxiv icon

Synthetic Multimodal Question Generation

Add code
Jul 02, 2024
Figure 1 for Synthetic Multimodal Question Generation
Figure 2 for Synthetic Multimodal Question Generation
Figure 3 for Synthetic Multimodal Question Generation
Figure 4 for Synthetic Multimodal Question Generation
Viaarxiv icon

WebCanvas: Benchmarking Web Agents in Online Environments

Add code
Jun 18, 2024
Figure 1 for WebCanvas: Benchmarking Web Agents in Online Environments
Figure 2 for WebCanvas: Benchmarking Web Agents in Online Environments
Figure 3 for WebCanvas: Benchmarking Web Agents in Online Environments
Figure 4 for WebCanvas: Benchmarking Web Agents in Online Environments
Viaarxiv icon

Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness

Add code
May 04, 2024
Figure 1 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Figure 2 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Figure 3 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Figure 4 for Beyond Relevance: Evaluate and Improve Retrievers on Perspective Awareness
Viaarxiv icon

Better Synthetic Data by Retrieving and Transforming Existing Datasets

Add code
Apr 26, 2024
Figure 1 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Figure 2 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Figure 3 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Figure 4 for Better Synthetic Data by Retrieving and Transforming Existing Datasets
Viaarxiv icon

Evaluating Mathematical Reasoning Beyond Accuracy

Add code
Apr 08, 2024
Figure 1 for Evaluating Mathematical Reasoning Beyond Accuracy
Figure 2 for Evaluating Mathematical Reasoning Beyond Accuracy
Figure 3 for Evaluating Mathematical Reasoning Beyond Accuracy
Figure 4 for Evaluating Mathematical Reasoning Beyond Accuracy
Viaarxiv icon

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

Add code
Feb 27, 2024
Figure 1 for Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models
Figure 2 for Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models
Figure 3 for Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models
Figure 4 for Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models
Viaarxiv icon

Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia

Add code
Feb 21, 2024
Viaarxiv icon

Measuring Adversarial Datasets

Add code
Nov 06, 2023
Figure 1 for Measuring Adversarial Datasets
Figure 2 for Measuring Adversarial Datasets
Figure 3 for Measuring Adversarial Datasets
Figure 4 for Measuring Adversarial Datasets
Viaarxiv icon

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Add code
Nov 04, 2023
Figure 1 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Figure 2 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Figure 3 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Figure 4 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Viaarxiv icon