Alert button
Picture for Maxwell Horton

Maxwell Horton

Alert button

On the Efficacy of Multi-scale Data Samplers for Vision Applications

Sep 08, 2023
Elvis Nunez, Thomas Merth, Anish Prabhu, Mehrdad Farajtabar, Mohammad Rastegari, Sachin Mehta, Maxwell Horton

Figure 1 for On the Efficacy of Multi-scale Data Samplers for Vision Applications
Figure 2 for On the Efficacy of Multi-scale Data Samplers for Vision Applications
Figure 3 for On the Efficacy of Multi-scale Data Samplers for Vision Applications
Figure 4 for On the Efficacy of Multi-scale Data Samplers for Vision Applications

Multi-scale resolution training has seen an increased adoption across multiple vision tasks, including classification and detection. Training with smaller resolutions enables faster training at the expense of a drop in accuracy. Conversely, training with larger resolutions has been shown to improve performance, but memory constraints often make this infeasible. In this paper, we empirically study the properties of multi-scale training procedures. We focus on variable batch size multi-scale data samplers that randomly sample an input resolution at each training iteration and dynamically adjust their batch size according to the resolution. Such samplers have been shown to improve model accuracy beyond standard training with a fixed batch size and resolution, though it is not clear why this is the case. We explore the properties of these data samplers by performing extensive experiments on ResNet-101 and validate our conclusions across multiple architectures, tasks, and datasets. We show that multi-scale samplers behave as implicit data regularizers and accelerate training speed. Compared to models trained with single-scale samplers, we show that models trained with multi-scale samplers retain or improve accuracy, while being better-calibrated and more robust to scaling and data distribution shifts. We additionally extend a multi-scale variable batch sampler with a simple curriculum that progressively grows resolutions throughout training, allowing for a compute reduction of more than 30%. We show that the benefits of multi-scale training extend to detection and instance segmentation tasks, where we observe a 37% reduction in training FLOPs along with a 3-4% mAP increase on MS-COCO using a Mask R-CNN model.

Viaarxiv icon

Bytes Are All You Need: Transformers Operating Directly On File Bytes

May 31, 2023
Maxwell Horton, Sachin Mehta, Ali Farhadi, Mohammad Rastegari

Figure 1 for Bytes Are All You Need: Transformers Operating Directly On File Bytes
Figure 2 for Bytes Are All You Need: Transformers Operating Directly On File Bytes
Figure 3 for Bytes Are All You Need: Transformers Operating Directly On File Bytes
Figure 4 for Bytes Are All You Need: Transformers Operating Directly On File Bytes

Modern deep learning approaches usually transform inputs into a modality-specific form. For example, the most common deep learning approach to image classification involves decoding image file bytes into an RGB tensor which is passed into a neural network. Instead, we investigate performing classification directly on file bytes, without the need for decoding files at inference time. Using file bytes as model inputs enables the development of models which can operate on multiple input modalities. Our model, \emph{ByteFormer}, achieves an ImageNet Top-1 classification accuracy of $77.33\%$ when training and testing directly on TIFF file bytes using a transformer backbone with configuration similar to DeiT-Ti ($72.2\%$ accuracy when operating on RGB images). Without modifications or hyperparameter tuning, ByteFormer achieves $95.42\%$ classification accuracy when operating on WAV files from the Speech Commands v2 dataset (compared to state-of-the-art accuracy of $98.7\%$). Additionally, we demonstrate that ByteFormer has applications in privacy-preserving inference. ByteFormer is capable of performing inference on particular obfuscated input representations with no loss of accuracy. We also demonstrate ByteFormer's ability to perform inference with a hypothetical privacy-preserving camera which avoids forming full images by consistently masking $90\%$ of pixel channels, while still achieving $71.35\%$ accuracy on ImageNet. Our code will be made available at https://github.com/apple/ml-cvnets/tree/main/examples/byteformer.

Viaarxiv icon

RangeAugment: Efficient Online Augmentation with Range Learning

Dec 20, 2022
Sachin Mehta, Saeid Naderiparizi, Fartash Faghri, Maxwell Horton, Lailin Chen, Ali Farhadi, Oncel Tuzel, Mohammad Rastegari

Figure 1 for RangeAugment: Efficient Online Augmentation with Range Learning
Figure 2 for RangeAugment: Efficient Online Augmentation with Range Learning
Figure 3 for RangeAugment: Efficient Online Augmentation with Range Learning
Figure 4 for RangeAugment: Efficient Online Augmentation with Range Learning

State-of-the-art automatic augmentation methods (e.g., AutoAugment and RandAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges for each operation, which may lead to sub-optimal policies. To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us to efficiently learn the range of magnitudes for individual as well as composite augmentation operations. RangeAugment uses an auxiliary loss based on image similarity as a measure to control the range of magnitudes of augmentation operations. As a result, RangeAugment has a single scalar parameter for search, image similarity, which we simply optimize via linear search. RangeAugment integrates seamlessly with any model and learns model- and task-specific augmentation policies. With extensive experiments on the ImageNet dataset across different networks, we show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations. Experimental results on semantic segmentation, object detection, foundation models, and knowledge distillation further shows RangeAugment's effectiveness.

* Technical report (22 pages including references and appendix) 
Viaarxiv icon

SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

Jul 21, 2022
Chien-Yu Lin, Anish Prabhu, Thomas Merth, Sachin Mehta, Anurag Ranjan, Maxwell Horton, Mohammad Rastegari

Figure 1 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Figure 2 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Figure 3 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks
Figure 4 for SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks

Recent isotropic networks, such as ConvMixer and vision transformers, have found significant success across visual recognition tasks, matching or outperforming non-isotropic convolutional neural networks (CNNs). Isotropic architectures are particularly well-suited to cross-layer weight sharing, an effective neural network compression technique. In this paper, we perform an empirical evaluation on methods for sharing parameters in isotropic networks (SPIN). We present a framework to formalize major weight sharing design decisions and perform a comprehensive empirical evaluation of this design space. Guided by our experimental results, we propose a weight sharing strategy to generate a family of models with better overall efficiency, in terms of FLOPs and parameters versus accuracy, compared to traditional scaling methods alone, for example compressing ConvMixer by 1.9x while improving accuracy on ImageNet. Finally, we perform a qualitative study to further understand the behavior of weight sharing in isotropic architectures. The code is available at https://github.com/apple/ml-spin.

* Accepted at ECCV 2022 
Viaarxiv icon

LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

Oct 08, 2021
Elvis Nunez, Maxwell Horton, Anish Prabhu, Anurag Ranjan, Ali Farhadi, Mohammad Rastegari

Figure 1 for LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time
Figure 2 for LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time
Figure 3 for LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time
Figure 4 for LCS: Learning Compressible Subspaces for Adaptive Network Compression at Inference Time

When deploying deep learning models to a device, it is traditionally assumed that available computational resources (compute, memory, and power) remain static. However, real-world computing systems do not always provide stable resource guarantees. Computational resources need to be conserved when load from other processes is high or battery power is low. Inspired by recent works on neural network subspaces, we propose a method for training a "compressible subspace" of neural networks that contains a fine-grained spectrum of models that range from highly efficient to highly accurate. Our models require no retraining, thus our subspace of models can be deployed entirely on-device to allow adaptive network compression at inference time. We present results for achieving arbitrarily fine-grained accuracy-efficiency trade-offs at inference time for structured and unstructured sparsity. We achieve accuracies on-par with standard models when testing our uncompressed models, and maintain high accuracy for sparsity rates above 90% when testing our compressed models. We also demonstrate that our algorithm extends to quantization at variable bit widths, achieving accuracy on par with individually trained networks.

Viaarxiv icon

Learning Neural Network Subspaces

Feb 20, 2021
Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari

Figure 1 for Learning Neural Network Subspaces
Figure 2 for Learning Neural Network Subspaces
Figure 3 for Learning Neural Network Subspaces
Figure 4 for Learning Neural Network Subspaces

Recent observations have advanced our understanding of the neural network optimization landscape, revealing the existence of (1) paths of high accuracy containing diverse solutions and (2) wider minima offering improved performance. Previous methods observing diverse paths require multiple training runs. In contrast we aim to leverage both property (1) and (2) with a single method and in a single training run. With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks. These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost. Moreover, using the subspace midpoint boosts accuracy, calibration, and robustness to label noise, outperforming Stochastic Weight Averaging.

Viaarxiv icon

Layer-Wise Data-Free CNN Compression

Nov 18, 2020
Maxwell Horton, Yanzi Jin, Ali Farhadi, Mohammad Rastegari

Figure 1 for Layer-Wise Data-Free CNN Compression
Figure 2 for Layer-Wise Data-Free CNN Compression
Figure 3 for Layer-Wise Data-Free CNN Compression
Figure 4 for Layer-Wise Data-Free CNN Compression

We present an efficient method for compressing a trained neural network without using any data. Our data-free method requires 14x-450x fewer FLOPs than comparable state-of-the-art methods. We break the problem of data-free network compression into a number of independent layer-wise compressions. We show how to efficiently generate layer-wise training data, and how to precondition the network to maintain accuracy during layer-wise compression. We show state-of-the-art performance on MobileNetV1 for data-free low-bit-width quantization. We also show state-of-the-art performance on data-free pruning of EfficientNet B0 when combining our method with end-to-end generative methods.

Viaarxiv icon

Label Refinery: Improving ImageNet Classification through Label Progression

May 07, 2018
Hessam Bagherinezhad, Maxwell Horton, Mohammad Rastegari, Ali Farhadi

Figure 1 for Label Refinery: Improving ImageNet Classification through Label Progression
Figure 2 for Label Refinery: Improving ImageNet Classification through Label Progression
Figure 3 for Label Refinery: Improving ImageNet Classification through Label Progression
Figure 4 for Label Refinery: Improving ImageNet Classification through Label Progression

Among the three main components (data, labels, and models) of any supervised learning system, data and models have been the main subjects of active research. However, studying labels and their properties has received very little attention. Current principles and paradigms of labeling impose several challenges to machine learning algorithms. Labels are often incomplete, ambiguous, and redundant. In this paper we study the effects of various properties of labels and introduce the Label Refinery: an iterative procedure that updates the ground truth labels after examining the entire dataset. We show significant gain using refined labels across a wide range of models. Using a Label Refinery improves the state-of-the-art top-1 accuracy of (1) AlexNet from 59.3 to 67.2, (2) MobileNet from 70.6 to 73.39, (3) MobileNet-0.25 from 50.6 to 55.59, (4) VGG19 from 72.7 to 75.46, and (5) Darknet19 from 72.9 to 74.47.

Viaarxiv icon