Alert button
Picture for Jan Chorowski

Jan Chorowski

Alert button

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

Jul 12, 2023
Michal Bartoszkiewicz, Jan Chorowski, Adrian Kosowski, Jakub Kowalski, Sergey Kulik, Mateusz Lewandowski, Krzysztof Nowicki, Kamil Piechowiak, Olivier Ruas, Zuzanna Stamirowska, Przemyslaw Uznanski

Figure 1 for Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications
Figure 2 for Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications
Figure 3 for Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications
Figure 4 for Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.).

Viaarxiv icon

Efficient Transformers with Dynamic Token Pooling

Nov 17, 2022
Piotr Nawrot, Jan Chorowski, Adrian Łańcucki, Edoardo M. Ponti

Figure 1 for Efficient Transformers with Dynamic Token Pooling
Figure 2 for Efficient Transformers with Dynamic Token Pooling
Figure 3 for Efficient Transformers with Dynamic Token Pooling
Figure 4 for Efficient Transformers with Dynamic Token Pooling

Transformers achieve unrivalled performance in modelling language, but remain inefficient in terms of memory and time complexity. A possible remedy is to reduce the sequence length in the intermediate layers by pooling fixed-length segments of tokens. Nevertheless, natural units of meaning, such as words or phrases, display varying sizes. To address this mismatch, we equip language models with a dynamic-pooling mechanism, which predicts segment boundaries in an autoregressive fashion. We compare several methods to infer boundaries, including end-to-end learning through stochastic re-parameterisation, supervised learning (based on segmentations from subword tokenizers or spikes in conditional entropy), as well as linguistically motivated boundaries. We perform character-level evaluation on texts from multiple datasets and morphologically diverse languages. The results demonstrate that dynamic pooling, which jointly segments and models language, is often both faster and more accurate than vanilla Transformers and fixed-length pooling within the same computational budget.

Viaarxiv icon

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

Jun 07, 2022
Santiago Cuervo, Adrian Łańcucki, Ricard Marxer, Paweł Rychlikowski, Jan Chorowski

Figure 1 for Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Figure 2 for Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Figure 3 for Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Figure 4 for Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones. In this paper we explore self-supervised learning of hierarchical representations of speech by applying multiple levels of Contrastive Predictive Coding (CPC). We observe that simply stacking two CPC models does not yield significant improvements over single-level architectures. Inspired by the fact that speech is often described as a sequence of discrete units unevenly distributed in time, we propose a model in which the output of a low-level CPC module is non-uniformly downsampled to directly minimize the loss of a high-level CPC module. The latter is designed to also enforce a prior of separability and discreteness in its representations by enforcing dissimilarity of successive high-level representations through focused negative sampling, and by quantization of the prediction targets. Accounting for the structure of the speech signal improves upon single-level CPC features and enhances the disentanglement of the learned representations, as measured by downstream speech recognition tasks, while resulting in a meaningful segmentation of the signal that closely resembles phone boundaries.

* Submitted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022) 
Viaarxiv icon

Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words

Oct 29, 2021
Santiago Cuervo, Maciej Grabias, Jan Chorowski, Grzegorz Ciesielski, Adrian Łańcucki, Paweł Rychlikowski, Ricard Marxer

Figure 1 for Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
Figure 2 for Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
Figure 3 for Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
Figure 4 for Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words

We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC). Our experiments show that with the existing algorithms there is a trade off between categorization and segmentation performance. We investigate the source of this conflict and conclude that the use of context building networks, albeit necessary for superior performance on categorization tasks, harms segmentation performance by causing a temporal shift on the learned representations. Aiming to bridge this gap, we take inspiration from the leading approach on segmentation, which simultaneously models the speech signal at the frame and phoneme level, and incorporate multi-level modelling into Aligned CPC (ACPC), a variation of CPC which exhibits the best performance on categorization tasks. Our multi-level ACPC (mACPC) improves in all categorization metrics and achieves state-of-the-art performance in word segmentation.

Viaarxiv icon

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw

Jun 22, 2021
Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski

Figure 1 for Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
Figure 2 for Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
Figure 3 for Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw
Figure 4 for Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw

We present a number of low-resource approaches to the tasks of the Zero Resource Speech Challenge 2021. We build on the unsupervised representations of speech proposed by the organizers as a baseline, derived from CPC and clustered with the k-means algorithm. We demonstrate that simple methods of refining those representations can narrow the gap, or even improve upon the solutions which use a high computational budget. The results lead to the conclusion that the CPC-derived representations are still too noisy for training language models, but stable enough for simpler forms of pattern matching and retrieval.

* Published in Interspeech 2021 
Viaarxiv icon

Aligned Contrastive Predictive Coding

Apr 29, 2021
Jan Chorowski, Grzegorz Ciesielski, Jarosław Dzikowski, Adrian Łańcucki, Ricard Marxer, Mateusz Opala, Piotr Pusz, Paweł Rychlikowski, Michał Stypułkowski

Figure 1 for Aligned Contrastive Predictive Coding
Figure 2 for Aligned Contrastive Predictive Coding
Figure 3 for Aligned Contrastive Predictive Coding
Figure 4 for Aligned Contrastive Predictive Coding

We investigate the possibility of forcing a self-supervised model trained using a contrastive predictive loss to extract slowly varying latent representations. Rather than producing individual predictions for each of the future representations, the model emits a sequence of predictions shorter than that of the upcoming representations to which they will be aligned. In this way, the prediction network solves a simpler task of predicting the next symbols, but not their exact timing, while the encoding network is trained to produce piece-wise constant latent codes. We evaluate the model on a speech coding task and demonstrate that the proposed Aligned Contrastive Predictive Coding (ACPC) leads to higher linear phone prediction accuracy and lower ABX error rates, while being slightly faster to train due to the reduced number of prediction heads.

Viaarxiv icon

Representing Point Clouds with Generative Conditional Invertible Flow Networks

Oct 07, 2020
Michał Stypułkowski, Kacper Kania, Maciej Zamorski, Maciej Zięba, Tomasz Trzciński, Jan Chorowski

Figure 1 for Representing Point Clouds with Generative Conditional Invertible Flow Networks
Figure 2 for Representing Point Clouds with Generative Conditional Invertible Flow Networks
Figure 3 for Representing Point Clouds with Generative Conditional Invertible Flow Networks
Figure 4 for Representing Point Clouds with Generative Conditional Invertible Flow Networks

In this paper, we propose a simple yet effective method to represent point clouds as sets of samples drawn from a cloud-specific probability distribution. This interpretation matches intrinsic characteristics of point clouds: the number of points and their ordering within a cloud is not important as all points are drawn from the proximity of the object boundary. We postulate to represent each cloud as a parameterized probability distribution defined by a generative neural network. Once trained, such a model provides a natural framework for point cloud manipulation operations, such as aligning a new cloud into a default spatial orientation. To exploit similarities between same-class objects and to improve model performance, we turn to weight sharing: networks that model densities of points belonging to objects in the same family share all parameters with the exception of a small, object-specific embedding vector. We show that these embedding vectors capture semantic relationships between objects. Our method leverages generative invertible flow networks to learn embeddings as well as to generate point clouds. Thanks to this formulation and contrary to similar approaches, we are able to train our model in an end-to-end fashion. As a result, our model offers competitive or superior quantitative results on benchmark datasets, while enabling unprecedented capabilities to perform cloud manipulation tasks, such as point cloud registration and regeneration, by a generative network.

Viaarxiv icon

A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

Jun 03, 2020
Sameer Khurana, Antoine Laurent, Wei-Ning Hsu, Jan Chorowski, Adrian Lancucki, Ricard Marxer, James Glass

Figure 1 for A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
Figure 2 for A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning
Figure 3 for A Convolutional Deep Markov Model for Unsupervised Speech Representation Learning

Probabilistic Latent Variable Models (LVMs) provide an alternative to self-supervised learning approaches for linguistic representation learning from speech. LVMs admit an intuitive probabilistic interpretation where the latent structure shapes the information extracted from the signal. Even though LVMs have recently seen a renewed interest due to the introduction of Variational Autoencoders (VAEs), their use for speech representation learning remains largely unexplored. In this work, we propose Convolutional Deep Markov Model (ConvDMM), a Gaussian state-space model with non-linear emission and transition functions modelled by deep neural networks. This unsupervised model is trained using black box variational inference. A deep convolutional neural network is used as an inference network for structured variational approximation. When trained on a large scale speech dataset (LibriSpeech), ConvDMM produces features that significantly outperform multiple self-supervised feature extracting methods on linear phone classification and recognition on the Wall Street Journal dataset. Furthermore, we found that ConvDMM complements self-supervised methods like Wav2Vec and PASE, improving on the results achieved with any of the methods alone. Lastly, we find that ConvDMM features enable learning better phone recognizers than any other features in an extreme low-resource regime with few labeled training examples.

* Submitted to Interspeech 2020 
Viaarxiv icon

Robust Training of Vector Quantized Bottleneck Models

May 18, 2020
Adrian Łańcucki, Jan Chorowski, Guillaume Sanchez, Ricard Marxer, Nanxin Chen, Hans J. G. A. Dolfing, Sameer Khurana, Tanel Alumäe, Antoine Laurent

Figure 1 for Robust Training of Vector Quantized Bottleneck Models
Figure 2 for Robust Training of Vector Quantized Bottleneck Models
Figure 3 for Robust Training of Vector Quantized Bottleneck Models
Figure 4 for Robust Training of Vector Quantized Bottleneck Models

In this paper we demonstrate methods for reliable and efficient training of discrete representation using Vector-Quantized Variational Auto-Encoder models (VQ-VAEs). Discrete latent variable models have been shown to learn nontrivial representations of speech, applicable to unsupervised voice conversion and reaching state-of-the-art performance on unit discovery tasks. For unsupervised representation learning, they became viable alternatives to continuous latent variable models such as the Variational Auto-Encoder (VAE). However, training deep discrete variable models is challenging, due to the inherent non-differentiability of the discretization operation. In this paper we focus on VQ-VAE, a state-of-the-art discrete bottleneck model shown to perform on par with its continuous counterparts. It quantizes encoder outputs with on-line $k$-means clustering. We show that the codebook learning can suffer from poor initialization and non-stationarity of clustered encoder outputs. We demonstrate that these can be successfully overcome by increasing the learning rate for the codebook and periodic date-dependent codeword re-initialization. As a result, we achieve more robust training across different tasks, and significantly increase the usage of latent codewords even for large codebooks. This has practical benefit, for instance, in unsupervised representation learning, where large codebooks may lead to disentanglement of latent representations.

* Published at IJCNN 2020 
Viaarxiv icon