Get our free extension to see links to code for papers anywhere online!Free extension: code links for papers anywhere!Free add-on: See code for papers anywhere!

Haotian Jiang, Qianxiao Li

We present a theoretical analysis of the approximation properties of convolutional architectures when applied to the modeling of temporal sequences. Specifically, we prove an approximation rate estimate (Jackson-type result) and an inverse approximation theorem (Bernstein-type result), which together provide a comprehensive characterization of the types of sequential relationships that can be efficiently captured by a temporal convolutional architecture. The rate estimate improves upon a previous result via the introduction of a refined complexity measure, whereas the inverse approximation theorem is new.

Via

Haotian Jiang, Qianxiao Li

The transformer is a widely applied architecture in sequence modeling applications, but the theoretical understanding of its working principles is limited. In this work, we investigate the ability of transformers to approximate sequential relationships. We first prove a universal approximation theorem for the transformer hypothesis space. From its derivation, we identify a novel notion of regularity under which we can prove an explicit approximation rate estimate. This estimate reveals key structural properties of the transformer and suggests the types of sequence relationships that the transformer is adapted to approximating. In particular, it allows us to concretely discuss the structural bias between the transformer and classical sequence modeling methods, such as recurrent neural networks. Our findings are supported by numerical experiments.

Via

Yangsibo Huang, Haotian Jiang, Daogao Liu, Mohammad Mahdian, Jieming Mao, Vahab Mirrokni

In this paper, we study the setting in which data owners train machine learning models collaboratively under a privacy notion called joint differential privacy [Kearns et al., 2018]. In this setting, the model trained for each data owner $j$ uses $j$'s data without privacy consideration and other owners' data with differential privacy guarantees. This setting was initiated in [Jain et al., 2021] with a focus on linear regressions. In this paper, we study this setting for stochastic convex optimization (SCO). We present an algorithm that is a variant of DP-SGD [Song et al., 2013; Abadi et al., 2016] and provides theoretical bounds on its population loss. We compare our algorithm to several baselines and discuss for what parameter setups our algorithm is more preferred. We also empirically study joint differential privacy in the multi-class classification problem over two public datasets. Our empirical findings are well-connected to the insights from our theoretical results.

Via

Haotian Jiang, Qianxiao Li, Zhong Li, Shida Wang

We survey current developments in the approximation theory of sequence modelling in machine learning. Particular emphasis is placed on classifying existing results for various model architectures through the lens of classical approximation paradigms, and the insights one can gain from these results. We also outline some future research directions towards building a theory of sequence modelling.

Via

Xiuzhen Guo, Longfei Shangguan, Yuan He, Jia Zhang, Haotian Jiang, Awais Ahmad Siddiqi, Yunhao Liu

Backscatter communication holds potential for ubiquitous and low-cost connectivity among low-power IoT devices. To avoid interference between the carrier signal and the backscatter signal, recent works propose a frequency-shifting technique to separate these two signals in the frequency domain. Such proposals, however, have to occupy the precious wireless spectrum that is already overcrowded, and increase the power, cost, and complexity of the backscatter tag. In this paper, we revisit the classic ON-OFF Keying (OOK) modulation and propose Aloba, a backscatter system that takes the ambient LoRa transmissions as the excitation and piggybacks the in-band OOK modulated signals over the LoRa transmissions. Our design enables the backsactter signal to work in the same frequency band of the carrier signal, meanwhile achieving flexible data rate at different transmission range. The key contributions of Aloba include: (1) the design of a low-power backscatter tag that can pick up the ambient LoRa signals from other signals. (2) a novel decoding algorithm to demodulate both the carrier signal and the backscatter signal from their superposition. We further adopt link coding mechanism and interleave operation to enhance the reliability of backscatter signal decoding. We implement Aloba and conduct head-to-head comparison with the state-of-the-art LoRa backscatter system PLoRa in various settings. The experiment results show Aloba can achieve 199.4 Kbps data rate at various distances, 52.4 times higher than PLoRa.

Via

Xiuzhen Guo, Longfei Shangguan, Yuan He, Nan Jing, Jiacheng Zhang, Haotian Jiang, Yunhao Liu

The radio range of backscatter systems continues growing as new wireless communication primitives are continuously invented. Nevertheless, both the bit error rate and the packet loss rate of backscatter signals increase rapidly with the radio range, thereby necessitating the cooperation between the access point and the backscatter tags through a feedback loop. Unfortunately, the low-power nature of backscatter tags limits their ability to demodulate feedback signals from a remote access point and scales down to such circumstances. This paper presents Saiyan, an ultra-low-power demodulator for long-range LoRa backscatter systems. With Saiyan, a backscatter tag can demodulate feedback signals from a remote access point with moderate power consumption and then perform an immediate packet retransmission in the presence of packet loss. Moreover, Saiyan enables rate adaption and channel hopping-two PHY-layer operations that are important to channel efficiency yet unavailable on long-range backscatter systems. We prototype Saiyan on a two-layer PCB board and evaluate its performance in different environments. Results show that Saiyan achieves 5 gain on the demodulation range, compared with state-of-the-art systems. Our ASIC simulation shows that the power consumption of Saiyan is around 93.2 uW. Code and hardware schematics can be found at: https://github.com/ZangJac/Saiyan.

Via

Sally Dong, Haotian Jiang, Yin Tat Lee, Swati Padmanabhan, Guanghao Ye

Many fundamental problems in machine learning can be formulated by the convex program \[ \min_{\theta\in R^d}\ \sum_{i=1}^{n}f_{i}(\theta), \] where each $f_i$ is a convex, Lipschitz function supported on a subset of $d_i$ coordinates of $\theta$. One common approach to this problem, exemplified by stochastic gradient descent, involves sampling one $f_i$ term at every iteration to make progress. This approach crucially relies on a notion of uniformity across the $f_i$'s, formally captured by their condition number. In this work, we give an algorithm that minimizes the above convex formulation to $\epsilon$-accuracy in $\widetilde{O}(\sum_{i=1}^n d_i \log (1 /\epsilon))$ gradient computations, with no assumptions on the condition number. The previous best algorithm independent of the condition number is the standard cutting plane method, which requires $O(nd \log (1/\epsilon))$ gradient computations. As a corollary, we improve upon the evaluation oracle complexity for decomposable submodular minimization by Axiotis et al. (ICML 2021). Our main technical contribution is an adaptive procedure to select an $f_i$ term at every iteration via a novel combination of cutting-plane and interior-point methods.

Via

Venkat Varada, Mina Ghashami, Jitesh Mehta, Haotian Jiang, Kurtis Voris

Entity Resolution (ER) in voice assistants is a prime component during run time that resolves entities in users request to real world entities. ER involves two major functionalities 1. Relevance generation and 2. Ranking. In this paper we propose a low cost relevance generation framework by generating features using customer implicit and explicit feedback signals. The generated relevance datasets can serve as test sets to measure ER performance. We also introduce a set of metrics that accurately measures the performance of ER systems in various dimensions. They provide great interpretability to deep dive and identifying root cause of ER issues, whether the problem is in relevance generation or ranking.

Via

Haotian Jiang, Zhong Li, Qianxiao Li

We study the approximation properties of convolutional architectures applied to time series modelling, which can be formulated mathematically as a functional approximation problem. In the recurrent setting, recent results reveal an intricate connection between approximation efficiency and memory structures in the data generation process. In this paper, we derive parallel results for convolutional architectures, with WaveNet being a prime example. Our results reveal that in this new setting, approximation efficiency is not only characterised by memory, but also additional fine structures in the target relationship. This leads to a novel definition of spectrum-based regularity that measures the complexity of temporal relationships under the convolutional approximation scheme. These analyses provide a foundation to understand the differences between architectural choices for time series modelling and can give theoretically grounded guidance for practical applications.

Via

Haotian Jiang, Yin Tat Lee, Zhao Song, Sam Chiu-wai Wong

Given a separation oracle for a convex set $K \subset \mathbb{R}^n$ that is contained in a box of radius $R$, the goal is to either compute a point in $K$ or prove that $K$ does not contain a ball of radius $\epsilon$. We propose a new cutting plane algorithm that uses an optimal $O(n \log (\kappa))$ evaluations of the oracle and an additional $O(n^2)$ time per evaluation, where $\kappa = nR/\epsilon$. $\bullet$ This improves upon Vaidya's $O( \text{SO} \cdot n \log (\kappa) + n^{\omega+1} \log (\kappa))$ time algorithm [Vaidya, FOCS 1989a] in terms of polynomial dependence on $n$, where $\omega < 2.373$ is the exponent of matrix multiplication and $\text{SO}$ is the time for oracle evaluation. $\bullet$ This improves upon Lee-Sidford-Wong's $O( \text{SO} \cdot n \log (\kappa) + n^3 \log^{O(1)} (\kappa))$ time algorithm [Lee, Sidford and Wong, FOCS 2015] in terms of dependence on $\kappa$. For many important applications in economics, $\kappa = \Omega(\exp(n))$ and this leads to a significant difference between $\log(\kappa)$ and $\mathrm{poly}(\log (\kappa))$. We also provide evidence that the $n^2$ time per evaluation cannot be improved and thus our running time is optimal. A bottleneck of previous cutting plane methods is to compute leverage scores, a measure of the relative importance of past constraints. Our result is achieved by a novel multi-layered data structure for leverage score maintenance, which is a sophisticated combination of diverse techniques such as random projection, batched low-rank update, inverse maintenance, polynomial interpolation, and fast rectangular matrix multiplication. Interestingly, our method requires a combination of different fast rectangular matrix multiplication algorithms.

Via