Alert button
Picture for Tongshuang Wu

Tongshuang Wu

Alert button

Evaluating Mathematical Reasoning Beyond Accuracy

Add code
Bookmark button
Alert button
Apr 08, 2024
Shijie Xia, Xuefeng Li, Yixin Liu, Tongshuang Wu, Pengfei Liu

Viaarxiv icon

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

Add code
Bookmark button
Alert button
Feb 27, 2024
Xinran Zhao, Hongming Zhang, Xiaoman Pan, Wenlin Yao, Dong Yu, Tongshuang Wu, Jianshu Chen

Viaarxiv icon

Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia

Add code
Bookmark button
Alert button
Feb 21, 2024
Tzu-Sheng Kuo, Aaron Halfaker, Zirui Cheng, Jiwoo Kim, Meng-Hsin Wu, Tongshuang Wu, Kenneth Holstein, Haiyi Zhu

Viaarxiv icon

Measuring Adversarial Datasets

Add code
Bookmark button
Alert button
Nov 06, 2023
Yuanchen Bai, Raoyi Huang, Vijay Viswanathan, Tzu-Sheng Kuo, Tongshuang Wu

Viaarxiv icon

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI

Add code
Bookmark button
Alert button
Nov 04, 2023
Shayne Longpre, Robert Mahari, Anthony Chen, Naana Obeng-Marnu, Damien Sileo, William Brannon, Niklas Muennighoff, Nathan Khazam, Jad Kabbara, Kartik Perisetla, Xinyi Wu, Enrico Shippole, Kurt Bollacker, Tongshuang Wu, Luis Villa, Sandy Pentland, Sara Hooker

Figure 1 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Figure 2 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Figure 3 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Figure 4 for The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI
Viaarxiv icon

Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs

Add code
Bookmark button
Alert button
Oct 14, 2023
Chenyang Yang, Rishabh Rustogi, Rachel Brower-Sinning, Grace A. Lewis, Christian Kästner, Tongshuang Wu

Viaarxiv icon

From Nuisance to News Sense: Augmenting the News with Cross-Document Evidence and Context

Add code
Bookmark button
Alert button
Oct 06, 2023
Jeremiah Milbauer, Ziqi Ding, Zhijin Wu, Tongshuang Wu

Viaarxiv icon

Selenite: Scaffolding Decision Making with Comprehensive Overviews Elicited from Large Language Models

Add code
Bookmark button
Alert button
Oct 03, 2023
Michael Xieyang Liu, Tongshuang Wu, Tianying Chen, Franklin Mingzhe Li, Aniket Kittur, Brad A. Myers

Viaarxiv icon