Alert button
Picture for Mahmoud Safari

Mahmoud Safari

Alert button

Neural Architecture Search: Insights from 1000 Papers

Jan 25, 2023
Colin White, Mahmoud Safari, Rhea Sukthanker, Binxin Ru, Thomas Elsken, Arber Zela, Debadeepta Dey, Frank Hutter

Figure 1 for Neural Architecture Search: Insights from 1000 Papers
Figure 2 for Neural Architecture Search: Insights from 1000 Papers
Figure 3 for Neural Architecture Search: Insights from 1000 Papers
Figure 4 for Neural Architecture Search: Insights from 1000 Papers

In the past decade, advances in deep learning have resulted in breakthroughs in a variety of areas, including computer vision, natural language understanding, speech recognition, and reinforcement learning. Specialized, high-performing neural architectures are crucial to the success of deep learning in these areas. Neural architecture search (NAS), the process of automating the design of neural architectures for a given task, is an inevitable next step in automating machine learning and has already outpaced the best human-designed architectures on many tasks. In the past few years, research in NAS has been progressing rapidly, with over 1000 papers released since 2020 (Deng and Lindauer, 2021). In this survey, we provide an organized and comprehensive guide to neural architecture search. We give a taxonomy of search spaces, algorithms, and speedup techniques, and we discuss resources such as benchmarks, best practices, other surveys, and open-source libraries.

Viaarxiv icon

NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies

Oct 06, 2022
Arjun Krishnakumar, Colin White, Arber Zela, Renbo Tu, Mahmoud Safari, Frank Hutter

Figure 1 for NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies
Figure 2 for NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies
Figure 3 for NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies
Figure 4 for NAS-Bench-Suite-Zero: Accelerating Research on Zero Cost Proxies

Zero-cost proxies (ZC proxies) are a recent architecture performance prediction technique aiming to significantly speed up algorithms for neural architecture search (NAS). Recent work has shown that these techniques show great promise, but certain aspects, such as evaluating and exploiting their complementary strengths, are under-studied. In this work, we create NAS-Bench-Suite: we evaluate 13 ZC proxies across 28 tasks, creating by far the largest dataset (and unified codebase) for ZC proxies, enabling orders-of-magnitude faster experiments on ZC proxies, while avoiding confounding factors stemming from different implementations. To demonstrate the usefulness of NAS-Bench-Suite, we run a large-scale analysis of ZC proxies, including a bias analysis, and the first information-theoretic analysis which concludes that ZC proxies capture substantial complementary information. Motivated by these findings, we present a procedure to improve the performance of ZC proxies by reducing biases such as cell size, and we also show that incorporating all 13 ZC proxies into the surrogate models used by NAS algorithms can improve their predictive performance by up to 42%. Our code and datasets are available at https://github.com/automl/naslib/tree/zerocost.

* NeurIPS Datasets and Benchmarks Track 2022 
Viaarxiv icon

NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy

Feb 11, 2022
Yash Mehta, Colin White, Arber Zela, Arjun Krishnakumar, Guri Zabergja, Shakiba Moradian, Mahmoud Safari, Kaicheng Yu, Frank Hutter

Figure 1 for NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy
Figure 2 for NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy
Figure 3 for NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy
Figure 4 for NAS-Bench-Suite: NAS Evaluation is (Now) Surprisingly Easy

The release of tabular benchmarks, such as NAS-Bench-101 and NAS-Bench-201, has significantly lowered the computational overhead for conducting scientific research in neural architecture search (NAS). Although they have been widely adopted and used to tune real-world NAS algorithms, these benchmarks are limited to small search spaces and focus solely on image classification. Recently, several new NAS benchmarks have been introduced that cover significantly larger search spaces over a wide range of tasks, including object detection, speech recognition, and natural language processing. However, substantial differences among these NAS benchmarks have so far prevented their widespread adoption, limiting researchers to using just a few benchmarks. In this work, we present an in-depth analysis of popular NAS algorithms and performance prediction methods across 25 different combinations of search spaces and datasets, finding that many conclusions drawn from a few NAS benchmarks do not generalize to other benchmarks. To help remedy this problem, we introduce NAS-Bench-Suite, a comprehensive and extensible collection of NAS benchmarks, accessible through a unified interface, created with the aim to facilitate reproducible, generalizable, and rapid NAS research. Our code is available at https://github.com/automl/naslib.

* ICLR 2022 
Viaarxiv icon

SuperCoder: Program Learning Under Noisy Conditions From Superposition of States

Dec 07, 2020
Ali Davody, Mahmoud Safari, Răzvan V. Florian

Figure 1 for SuperCoder: Program Learning Under Noisy Conditions From Superposition of States
Figure 2 for SuperCoder: Program Learning Under Noisy Conditions From Superposition of States
Figure 3 for SuperCoder: Program Learning Under Noisy Conditions From Superposition of States
Figure 4 for SuperCoder: Program Learning Under Noisy Conditions From Superposition of States

We propose a new method of program learning in a Domain Specific Language (DSL) which is based on gradient descent with no direct search. The first component of our method is a probabilistic representation of the DSL variables. At each timestep in the program sequence, different DSL functions are applied on the DSL variables with a certain probability, leading to different possible outcomes. Rather than handling all these outputs separately, whose number grows exponentially with each timestep, we collect them into a superposition of variables which captures the information in a single, but fuzzy, state. This state is to be contrasted at the final timestep with the ground-truth output, through a loss function. The second component of our method is an attention-based recurrent neural network, which provides an appropriate initialization point for the gradient descent that optimizes the probabilistic representation. The method we have developed surpasses the state-of-the-art for synthesising long programs and is able to learn programs under noise.

* 11 pages, 6 figures 
Viaarxiv icon