Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saurav Kadavath

Measuring Mathematical Problem Solving With the MATH Dataset

Mar 05, 2021

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

Figure 1 for Measuring Mathematical Problem Solving With the MATH Dataset

Figure 2 for Measuring Mathematical Problem Solving With the MATH Dataset

Figure 3 for Measuring Mathematical Problem Solving With the MATH Dataset

Figure 4 for Measuring Mathematical Problem Solving With the MATH Dataset

Abstract:Many intellectual endeavors require mathematical problem solving, but this skill remains beyond the capabilities of computers. To measure this ability in machine learning models, we introduce MATH, a new dataset of 12,500 challenging competition mathematics problems. Each problem in MATH has a full step-by-step solution which can be used to teach models to generate answer derivations and explanations. To facilitate future research and increase accuracy on MATH, we also contribute a large auxiliary pretraining dataset which helps teach models the fundamentals of mathematics. Even though we are able to increase accuracy on MATH, our results show that accuracy remains relatively low, even with enormous Transformer models. Moreover, we find that simply increasing budgets and model parameter counts will be impractical for achieving strong mathematical reasoning if scaling trends continue. While scaling Transformers is automatically solving most other text-based tasks, scaling is not currently solving MATH. To have more traction on mathematical problem solving we will likely need new algorithmic advancements from the broader research community.

* Code and the MATH dataset is available at https://github.com/hendrycks/math/

Via

Access Paper or Ask Questions

The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Jun 29, 2020

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo(+3 more)

Figure 1 for The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Figure 2 for The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Figure 3 for The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Figure 4 for The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization

Abstract:We introduce three new robustness benchmarks consisting of naturally occurring distribution changes in image style, geographic location, camera operation, and more. Using our benchmarks, we take stock of previously proposed hypotheses for out-of-distribution robustness and put them to the test. We find that using larger models and synthetic data augmentation can improve robustness on real-world distribution shifts, contrary to claims in prior work. Motivated by this, we introduce a new data augmentation method which advances the state-of-the-art and outperforms models pretrained with 1000x more labeled data. We find that some methods consistently help with distribution shifts in texture and local image statistics, but these methods do not help with some other distribution shifts like geographic changes. We conclude that future research must study multiple distribution shifts simultaneously.

* Datasets, code, and models available at https://github.com/hendrycks/imagenet-r

Via

Access Paper or Ask Questions

Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Jun 28, 2019

Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song

Figure 1 for Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Figure 2 for Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Figure 3 for Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Figure 4 for Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty

Abstract:Self-supervision provides effective representations for downstream tasks without requiring labels. However, existing approaches lag behind fully supervised training and are often not thought beneficial beyond obviating the need for annotations. We find that self-supervision can benefit robustness in a variety of ways, including robustness to adversarial examples, label corruption, and common input corruptions. Additionally, self-supervision greatly benefits out-of-distribution detection on difficult, near-distribution outliers, so much so that it exceeds the performance of fully supervised methods. These results demonstrate the promise of self-supervision for improving robustness and uncertainty estimation and establish these tasks as new axes of evaluation for future self-supervised learning research.

* Code and dataset available at https://github.com/hendrycks/ss-ood

Via

Access Paper or Ask Questions