Over the last several years, there has been a significant progress in developing neural solvers for the Schr\"odinger Bridge (SB) problem and applying them to generative modeling. This new research field is justifiably fruitful as it is interconnected with the practically well-performing diffusion models and theoretically-grounded entropic optimal transport (EOT). Still the area lacks non-trivial tests allowing a researcher to understand how well do the methods solve SB or its equivalent continuous EOT problem. We fill this gap and propose a novel way to create pairs of probability distributions for which the ground truth OT solution in known by the construction. Our methodology is generic and works for a wide range of OT formulations, in particular, it covers the EOT which is equivalent to SB (the main interest of our study). This development allows us to create continuous benchmark distributions with the known EOT and SB solution on high-dimensional spaces such as spaces of images. As an illustration, we use these benchmark pairs to test how well do existing neural EOT/SB solvers actually compute the EOT solution. The benchmark is available via the link: https://github.com/ngushchin/EntropicOTBenchmark.
Generative DNNs are a powerful tool for image synthesis, but they are limited by their computational load. On the other hand, given a trained model and a task, e.g. faces generation within a range of characteristics, the output image quality will be unevenly distributed among images with different characteristics. It follows, that we might restrain the models complexity on some instances, maintaining a high quality. We propose a method for diminishing computations by adding so-called early exit branches to the original architecture, and dynamically switching the computational path depending on how difficult it will be to render the output. We apply our method on two different SOTA models performing generative tasks: generation from a semantic map, and cross-reenactment of face expressions; showing it is able to output images with custom lower-quality thresholds. For a threshold of LPIPS <=0.1, we diminish their computations by up to a half. This is especially relevant for real-time applications such as synthesis of faces, when quality loss needs to be contained, but most of the inputs need fewer computations than the complex instances.