Alert button
Picture for Johannes Ballé

Johannes Ballé

Alert button

Neural Distributed Compressor Discovers Binning

Oct 25, 2023
Ezgi Ozyilkan, Johannes Ballé, Elza Erkip

We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.

* draft of a journal version of our previous ISIT 2023 paper (available at: arXiv:2305.04380). arXiv admin note: substantial text overlap with arXiv:2305.04380 
Viaarxiv icon

The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric

Oct 06, 2023
Daniel Severo, Lucas Theis, Johannes Ballé

Figure 1 for The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric
Figure 2 for The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric
Figure 3 for The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric
Figure 4 for The Unreasonable Effectiveness of Linear Prediction as a Perceptual Metric

We show how perceptual embeddings of the visual system can be constructed at inference-time with no training data or deep neural network features. Our perceptual embeddings are solutions to a weighted least squares (WLS) problem, defined at the pixel-level, and solved at inference-time, that can capture global and local image characteristics. The distance in embedding space is used to define a perceptual similarity metric which we call LASI: Linear Autoregressive Similarity Index. Experiments on full-reference image quality assessment datasets show LASI performs competitively with learned deep feature based methods like LPIPS (Zhang et al., 2018) and PIM (Bhardwaj et al., 2020), at a similar computational cost to hand-crafted methods such as MS-SSIM (Wang et al., 2003). We found that increasing the dimensionality of the embedding space consistently reduces the WLS loss while increasing performance on perceptual tasks, at the cost of increasing the computational complexity. LASI is fully differentiable, scales cubically with the number of embedding dimensions, and can be parallelized at the pixel-level. A Maximum Differentiation (MAD) competition (Wang & Simoncelli, 2008) between LASI and LPIPS shows that both methods are capable of finding failure points for the other, suggesting these metrics can be combined.

Viaarxiv icon

Wasserstein Distortion: Unifying Fidelity and Realism

Oct 05, 2023
Yang Qiu, Aaron B. Wagner, Johannes Ballé, Lucas Theis

We introduce a distortion measure for images, Wasserstein distortion, that simultaneously generalizes pixel-level fidelity on the one hand and realism on the other. We show how Wasserstein distortion reduces mathematically to a pure fidelity constraint or a pure realism constraint under different parameter choices. Pairs of images that are close under Wasserstein distortion illustrate its utility. In particular, we generate random textures that have high fidelity to a reference texture in one location of the image and smoothly transition to an independent realization of the texture as one moves away from this point. Connections between Wasserstein distortion and models of the human visual system are noted.

Viaarxiv icon

Learned Wyner-Ziv Compressors Recover Binning

May 07, 2023
Ezgi Ozyilkan, Johannes Ballé, Elza Erkip

Figure 1 for Learned Wyner-Ziv Compressors Recover Binning
Figure 2 for Learned Wyner-Ziv Compressors Recover Binning
Figure 3 for Learned Wyner-Ziv Compressors Recover Binning

We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, real-world applications of this problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme re-discovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as linear decoder behavior within each quantization index, for the quadratic-Gaussian case. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.

* to be appearing in ISIT 2023 
Viaarxiv icon

Do Neural Networks Compress Manifolds Optimally?

May 17, 2022
Sourbh Bhadane, Aaron B. Wagner, Johannes Ballé

Figure 1 for Do Neural Networks Compress Manifolds Optimally?
Figure 2 for Do Neural Networks Compress Manifolds Optimally?
Figure 3 for Do Neural Networks Compress Manifolds Optimally?
Figure 4 for Do Neural Networks Compress Manifolds Optimally?

Artificial Neural-Network-based (ANN-based) lossy compressors have recently obtained striking results on several sources. Their success may be ascribed to an ability to identify the structure of low-dimensional manifolds in high-dimensional ambient spaces. Indeed, prior work has shown that ANN-based compressors can achieve the optimal entropy-distortion curve for some such sources. In contrast, we determine the optimal entropy-distortion tradeoffs for two low-dimensional manifolds with circular structure and show that state-of-the-art ANN-based compressors fail to optimally compress the sources, especially at high rates.

Viaarxiv icon

Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

Jan 07, 2022
Nicole Mitchell, Johannes Ballé, Zachary Charles, Jakub Konečný

Figure 1 for Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory
Figure 2 for Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory
Figure 3 for Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory
Figure 4 for Optimizing the Communication-Accuracy Trade-off in Federated Learning with Rate-Distortion Theory

A significant bottleneck in federated learning is the network communication cost of sending model updates from client devices to the central server. We propose a method to reduce this cost. Our method encodes quantized updates with an appropriate universal code, taking into account their empirical distribution. Because quantization introduces error, we select quantization levels by optimizing for the desired trade-off in average total bitrate and gradient distortion. We demonstrate empirically that in spite of the non-i.i.d. nature of federated learning, the rate-distortion frontier is consistent across datasets, optimizers, clients and training rounds, and within each setting, distortion reliably predicts model performance. This allows for a remarkably simple compression scheme that is near-optimal in many use cases, and outperforms Top-K, DRIVE, 3LC and QSGD on the Stack Overflow next-word prediction benchmark.

Viaarxiv icon

Towards Generative Video Compression

Jul 26, 2021
Fabian Mentzer, Eirikur Agustsson, Johannes Ballé, David Minnen, Nick Johnston, George Toderici

Figure 1 for Towards Generative Video Compression
Figure 2 for Towards Generative Video Compression
Figure 3 for Towards Generative Video Compression
Figure 4 for Towards Generative Video Compression

We present a neural video compression method based on generative adversarial networks (GANs) that outperforms previous neural video compression methods and is comparable to HEVC in a user study. We propose a technique to mitigate temporal error accumulation caused by recursive frame compression that uses randomized shifting and un-shifting, motivated by a spectral analysis. We present in detail the network design choices, their relative importance, and elaborate on the challenges of evaluating video compression methods in user studies.

Viaarxiv icon

On the relation between statistical learning and perceptual distances

Jun 08, 2021
Alexander Hepburn, Valero Laparra, Raul Santos-Rodriguez, Johannes Ballé, Jesús Malo

Figure 1 for On the relation between statistical learning and perceptual distances
Figure 2 for On the relation between statistical learning and perceptual distances
Figure 3 for On the relation between statistical learning and perceptual distances
Figure 4 for On the relation between statistical learning and perceptual distances

It has been demonstrated many times that the behavior of the human visual system is connected to the statistics of natural images. Since machine learning relies on the statistics of training data as well, the above connection has interesting implications when using perceptual distances (which mimic the behavior of the human visual system) as a loss function. In this paper, we aim to unravel the non-trivial relationship between the probability distribution of the data, perceptual distances, and unsupervised machine learning. To this end, we show that perceptual sensitivity is correlated with the probability of an image in its close neighborhood. We also explore the relation between distances induced by autoencoders and the probability distribution of the data used for training them, as well as how these induced distances are correlated with human perception. Finally, we discuss why perceptual distances might not lead to noticeable gains in performance over standard Euclidean distances in common image processing tasks except when data is scarce and the perceptual distance provides regularization.

Viaarxiv icon

3D Scene Compression through Entropy Penalized Neural Representation Functions

Apr 26, 2021
Thomas Bird, Johannes Ballé, Saurabh Singh, Philip A. Chou

Figure 1 for 3D Scene Compression through Entropy Penalized Neural Representation Functions
Figure 2 for 3D Scene Compression through Entropy Penalized Neural Representation Functions
Figure 3 for 3D Scene Compression through Entropy Penalized Neural Representation Functions
Figure 4 for 3D Scene Compression through Entropy Penalized Neural Representation Functions

Some forms of novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views. Compared to 2D imagery, these types of applications require much larger amounts of storage space, which we seek to reduce. Existing approaches for compressing 3D scenes are based on a separation of compression and rendering: each of the original views is compressed using traditional 2D image formats; the receiver decompresses the views and then performs the rendering. We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints. The function is implemented as a neural network and jointly trained for reconstruction as well as compressibility, in an end-to-end manner, with the use of an entropy penalty on the parameters. Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstructions and lower bitrates. Furthermore, we show that the performance at lower bitrates can be improved by jointly representing multiple scenes using a soft form of parameter sharing.

* accepted (in an abridged format) as a contribution to the Learning-based Image Coding special session of the Picture Coding Symposium 2021 
Viaarxiv icon

End-to-end Learning of Compressible Features

Jul 23, 2020
Saurabh Singh, Sami Abu-El-Haija, Nick Johnston, Johannes Ballé, Abhinav Shrivastava, George Toderici

Figure 1 for End-to-end Learning of Compressible Features
Figure 2 for End-to-end Learning of Compressible Features
Figure 3 for End-to-end Learning of Compressible Features
Figure 4 for End-to-end Learning of Compressible Features

Pre-trained convolutional neural networks (CNNs) are powerful off-the-shelf feature generators and have been shown to perform very well on a variety of tasks. Unfortunately, the generated features are high dimensional and expensive to store: potentially hundreds of thousands of floats per example when processing videos. Traditional entropy based lossless compression methods are of little help as they do not yield desired level of compression, while general purpose lossy compression methods based on energy compaction (e.g. PCA followed by quantization and entropy coding) are sub-optimal, as they are not tuned to task specific objective. We propose a learned method that jointly optimizes for compressibility along with the task objective for learning the features. The plug-in nature of our method makes it straight-forward to integrate with any target objective and trade-off against compressibility. We present results on multiple benchmarks and demonstrate that our method produces features that are an order of magnitude more compressible, while having a regularization effect that leads to a consistent improvement in accuracy.

* Accepted at ICIP 2020 
Viaarxiv icon