Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maximilian Kalcher

A Scaling Law for Bandwidth Under Quantization

Feb 26, 2026

Maximilian Kalcher, Tena Dubcek

Abstract:We derive a scaling law relating ADC bit depth to effective bandwidth for signals with $1/f^α$ power spectra. Quantization introduces a flat noise floor whose intersection with the declining signal spectrum defines an effective cutoff frequency $f_c$. We show that each additional bit extends this cutoff by a factor of $2^{2/α}$, approximately doubling bandwidth per bit for $α= 2$. The law requires that quantization noise be approximately white, a condition whose minimum bit depth $N_{\min}$ we show to be $α$-dependent. Validation on synthetic $1/f^α$ signals for $α\in \{1.5, 2.0, 2.5\}$ yields prediction errors below 3\% using the theoretical noise floor $Δ^2/(6f_s)$, and approximately 14\% when the noise floor is estimated empirically from the quantized signal's spectrum. We illustrate practical implications on real EEG data.

* 4 pages, 3 figures, submitted to IEEE Signal Processing Letters

Via

Access Paper or Ask Questions

Frequency-Ordered Tokenization for Better Text Compression

Feb 26, 2026

Maximilian Kalcher

Abstract:We present frequency-ordered tokenization, a simple preprocessing technique that improves lossless text compression by exploiting the power-law frequency distribution of natural language tokens (Zipf's law). The method tokenizes text with Byte Pair Encoding (BPE), reorders the vocabulary so that frequent tokens receive small integer identifiers, and encodes the result with variable-length integers before passing it to any standard compressor. On enwik8 (100 MB Wikipedia), this yields improvements of 7.08 percentage points (pp) for zlib, 1.69 pp for LZMA, and 0.76 pp for zstd (all including vocabulary overhead), outperforming the classical Word Replacing Transform. Gains are consistent at 1 GB scale (enwik9) and across Chinese and Arabic text. We further show that preprocessing accelerates compression for computationally expensive algorithms: the total wall-clock time including preprocessing is 3.1x faster than raw zstd-22 and 2.4x faster than raw LZMA, because the preprocessed input is substantially smaller. The method can be implemented in under 50 lines of code.

* 5 pages, 4 figures, 9 tables

Via

Access Paper or Ask Questions

Image-based Data Representations of Time Series: A Comparative Analysis in EEG Artifact Detection

Dec 21, 2023

Aaron Maiwald, Leon Ackermann, Maximilian Kalcher, Daniel J. Wu

Abstract:Alternative data representations are powerful tools that augment the performance of downstream models. However, there is an abundance of such representations within the machine learning toolbox, and the field lacks a comparative understanding of the suitability of each representation method. In this paper, we propose artifact detection and classification within EEG data as a testbed for profiling image-based data representations of time series data. We then evaluate eleven popular deep learning architectures on each of six commonly-used representation methods. We find that, while the choice of representation entails a choice within the tradeoff between bias and variance, certain representations are practically more effective in highlighting features which increase the signal-to-noise ratio of the data. We present our results on EEG data, and open-source our testing framework to enable future comparative analyses in this vein.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions