Alert button
Picture for Peter Belcak

Peter Belcak

Alert button

Fast Feedforward Networks

Aug 28, 2023
Peter Belcak, Roger Wattenhofer

Figure 1 for Fast Feedforward Networks
Figure 2 for Fast Feedforward Networks
Figure 3 for Fast Feedforward Networks
Figure 4 for Fast Feedforward Networks

We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a logarithmic-time alternative to feedforward networks. We show that FFFs give comparable performance to feedforward networks at an exponential fraction of their inference cost, are quicker to deliver performance compared to mixture-of-expert networks, and can readily take the place of either in transformers. Pushing FFFs to the absolute limit, we train a vision transformer to perform single-neuron inferences at the cost of only 5.8% performance decrease against the full-width variant. Our implementation is available as a Python package; just use "pip install fastfeedforward".

* 12 pages, 6 figures, 4 tables 
Viaarxiv icon

Examining the Emergence of Deductive Reasoning in Generative Language Models

May 31, 2023
Peter Belcak, Luca A. Lanzendörfer, Roger Wattenhofer

Figure 1 for Examining the Emergence of Deductive Reasoning in Generative Language Models
Figure 2 for Examining the Emergence of Deductive Reasoning in Generative Language Models
Figure 3 for Examining the Emergence of Deductive Reasoning in Generative Language Models
Figure 4 for Examining the Emergence of Deductive Reasoning in Generative Language Models

We conduct a preliminary inquiry into the ability of generative transformer models to deductively reason from premises provided. We observe notable differences in the performance of models coming from different training setups and find that the deductive reasoning ability increases with scale. Further, we discover that the performance generally does not decrease with the length of the deductive chain needed to reach the conclusion, with the exception of OpenAI GPT-3 and GPT-3.5 models. Our study considers a wide variety of transformer-decoder models, ranging from 117 million to 175 billion parameters in size.

* Accepted to the 1st Natural Language Reasoning and Structured Explanations Workshop (NLRSE@ACL'23). 8 pages, 4 figures, 3 tables 
Viaarxiv icon

Neural Combinatorial Logic Circuit Synthesis from Input-Output Examples

Oct 29, 2022
Peter Belcak, Roger Wattenhofer

Figure 1 for Neural Combinatorial Logic Circuit Synthesis from Input-Output Examples
Figure 2 for Neural Combinatorial Logic Circuit Synthesis from Input-Output Examples

We propose a novel, fully explainable neural approach to synthesis of combinatorial logic circuits from input-output examples. The carrying advantage of our method is that it readily extends to inductive scenarios, where the set of examples is incomplete but still indicative of the desired behaviour. Our method can be employed for a virtually arbitrary choice of atoms - from logic gates to FPGA blocks - as long as they can be formulated in a differentiable fashion, and consistently yields good results for synthesis of practical circuits of increasing size. In particular, we succeed in learning a number of arithmetic, bitwise, and signal-routing operations, and even generalise towards the correct behaviour in inductive scenarios. Our method, attacking a discrete logical synthesis problem with an explainable neural approach, hints at a wider promise for synthesis and reasoning-related tasks.

* Accepted to the 2nd Workshop on Math-AI (MATH-AI@NeurIPS'22). 10 pages, 1 figure 
Viaarxiv icon

Deterministic Graph-Walking Program Mining

Aug 22, 2022
Peter Belcak, Roger Wattenhofer

Figure 1 for Deterministic Graph-Walking Program Mining
Figure 2 for Deterministic Graph-Walking Program Mining
Figure 3 for Deterministic Graph-Walking Program Mining

Owing to their versatility, graph structures admit representations of intricate relationships between the separate entities comprising the data. We formalise the notion of connection between two vertex sets in terms of edge and vertex features by introducing graph-walking programs. We give two algorithms for mining of deterministic graph-walking programs that yield programs in the order of increasing length. These programs characterise linear long-distance relationships between the given two vertex sets in the context of the whole graph.

* Paper accepted for an oral presentation at Advanced Data Mining and Applications (ADMA) 2022. 15 pages, 3 figures 
Viaarxiv icon

The LL(finite) strategy for optimal LL(k) parsing

Oct 15, 2020
Peter Belcak

Figure 1 for The LL(finite) strategy for optimal LL(k) parsing

The LL(finite) parsing strategy for parsing of LL(k) grammars where k needs not to be known is presented. The strategy parses input in linear time, uses arbitrary but always minimal lookahead necessary to disambiguate between alternatives of nonterminals, and it is optimal in the number of lookahead terminal scans performed. Modifications to the algorithm are shown that allow for resolution of grammar ambiguities by precedence -- effectively interpreting the input as a parsing expression grammar -- as well as for the use of predicates, and a proof of concept, the open-source parser generator Astir, employs the LL(finite) strategy in the output it generates.

Viaarxiv icon