We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a logarithmic-time alternative to feedforward networks. We show that FFFs give comparable performance to feedforward networks at an exponential fraction of their inference cost, are quicker to deliver performance compared to mixture-of-expert networks, and can readily take the place of either in transformers. Pushing FFFs to the absolute limit, we train a vision transformer to perform single-neuron inferences at the cost of only 5.8% performance decrease against the full-width variant. Our implementation is available as a Python package; just use "pip install fastfeedforward".
We conduct a preliminary inquiry into the ability of generative transformer models to deductively reason from premises provided. We observe notable differences in the performance of models coming from different training setups and find that the deductive reasoning ability increases with scale. Further, we discover that the performance generally does not decrease with the length of the deductive chain needed to reach the conclusion, with the exception of OpenAI GPT-3 and GPT-3.5 models. Our study considers a wide variety of transformer-decoder models, ranging from 117 million to 175 billion parameters in size.
We propose a novel, fully explainable neural approach to synthesis of combinatorial logic circuits from input-output examples. The carrying advantage of our method is that it readily extends to inductive scenarios, where the set of examples is incomplete but still indicative of the desired behaviour. Our method can be employed for a virtually arbitrary choice of atoms - from logic gates to FPGA blocks - as long as they can be formulated in a differentiable fashion, and consistently yields good results for synthesis of practical circuits of increasing size. In particular, we succeed in learning a number of arithmetic, bitwise, and signal-routing operations, and even generalise towards the correct behaviour in inductive scenarios. Our method, attacking a discrete logical synthesis problem with an explainable neural approach, hints at a wider promise for synthesis and reasoning-related tasks.
Owing to their versatility, graph structures admit representations of intricate relationships between the separate entities comprising the data. We formalise the notion of connection between two vertex sets in terms of edge and vertex features by introducing graph-walking programs. We give two algorithms for mining of deterministic graph-walking programs that yield programs in the order of increasing length. These programs characterise linear long-distance relationships between the given two vertex sets in the context of the whole graph.
The LL(finite) parsing strategy for parsing of LL(k) grammars where k needs not to be known is presented. The strategy parses input in linear time, uses arbitrary but always minimal lookahead necessary to disambiguate between alternatives of nonterminals, and it is optimal in the number of lookahead terminal scans performed. Modifications to the algorithm are shown that allow for resolution of grammar ambiguities by precedence -- effectively interpreting the input as a parsing expression grammar -- as well as for the use of predicates, and a proof of concept, the open-source parser generator Astir, employs the LL(finite) strategy in the output it generates.