Normalization techniques play an important role in supporting efficient and often more effective training of deep neural networks. While conventional methods explicitly normalize the activations, we suggest to add a loss term instead. This new loss term encourages the variance of the activations to be stable and not vary from one random mini-batch to the next. As we prove, this encourages the activations to be distributed around a few distinct modes. We also show that if the inputs are from a mixture of two Gaussians, the new loss would either join the two together, or separate between them optimally in the LDA sense, depending on the prior probabilities. Finally, we are able to link the new regularization term to the batchnorm method, which provides it with a regularization perspective. Our experiments demonstrate an improvement in accuracy over the batchnorm technique for both CNNs and fully connected networks.
Given a single image x from domain A and a set of images from domain B, our task is to generate the analogous of x in B. We argue that this task could be a key AI capability that underlines the ability of cognitive agents to act in the world and present empirical evidence that the existing unsupervised domain translation methods fail on this task. Our method follows a two step process. First, a variational autoencoder for domain B is trained. Then, given the new sample x, we create a variational autoencoder for domain A by adapting the layers that are close to the image in order to directly fit x, and only indirectly adapt the other layers. Our experiments indicate that the new method does as well, when trained on one sample x, as the existing domain transfer methods, when these enjoy a multitude of training samples from domain A. Our code is made publicly available at https://github.com/sagiebenaim/OneShotTranslation
We consider the problem of generating automatic code given sample input-output pairs. We train a neural network to map from the current state and the outputs to the program's next statement. The neural network optimizes multiple tasks concurrently: the next operation out of a set of high level commands, the operands of the next statement, and which variables can be dropped from memory. Using our method we are able to create programs that are more than twice as long as existing state-of-the-art solutions, while improving the success rate for comparable lengths, and cutting the run-time by two orders of magnitude. Our code is publicly available at https://github.com/amitz25/PCCoder
Several methods were recently proposed for the task of translating images between domains without prior knowledge in the form of correspondences. The existing methods apply adversarial learning to ensure that the distribution of the mapped source domain is indistinguishable from the target domain, which suffers from known stability issues. In addition, most methods rely heavily on `cycle' relationships between the domains, which enforce a one-to-one mapping. In this work, we introduce an alternative method: Non-Adversarial Mapping (NAM), which separates the task of target domain generative modeling from the cross-domain mapping task. NAM relies on a pre-trained generative model of the target domain, and aligns each source image with an image synthesized from the target domain, while jointly optimizing the domain mapping function. It has several key advantages: higher quality and resolution image translations, simpler and more stable training and reusable target models. Extensive experiments are presented validating the advantages of our method.
Unsupervised word translation from non-parallel inter-lingual corpora has attracted much research interest. Very recently, neural network methods trained with adversarial loss functions achieved high accuracy on this task. Despite the impressive success of the recent techniques, they suffer from the typical drawbacks of generative adversarial models: sensitivity to hyper-parameters, long training time and lack of interpretability. In this paper, we make the observation that two sufficiently similar distributions can be aligned correctly with iterative matching methods. We present a novel method that first aligns the second moment of the word distributions of the two languages and then iteratively refines the alignment. Extensive experiments on word translation of European and Non-European languages show that our method achieves better performance than recent state-of-the-art deep adversarial approaches and is competitive with the supervised baseline. It is also efficient, easy to parallelize on CPU and interpretable.
In this work, we ask the following question: Can visual analogies, learned in an unsupervised way, be used in order to transfer knowledge between pairs of games and even play one game using an agent trained for another game? We attempt to answer this research question by creating visual analogies between a pair of games: a source game and a target game. For example, given a video frame in the target game, we map it to an analogous state in the source game and then attempt to play using a trained policy learned for the source game. We demonstrate convincing visual mapping between four pairs of games (eight mappings), which are used to evaluate three transfer learning approaches.
The recent empirical success of cross-domain mapping algorithms, between two domains that share common characteristics, is not well-supported by theoretical justifications. This lacuna is especially troubling, given the clear ambiguity in such mappings. We work with the adversarial training method called the Wasserstein GAN. We derive a novel generalization bound, which limits the risk between the learned mapping $h$ and the target mapping $y$, by a sum of two terms: (i) the risk between $h$ and the most distant alternative mapping that was learned by the same cross-domain mapping algorithm, and (ii) the minimal Wasserstein GAN divergence between the target domain and the domain obtained by applying a hypothesis $h^*$ on the samples of the source domain, where $h^*$ is a hypothesis selected by the same algorithm. The bound is directly related to Occam's razor and it encourages the selection of the minimal architecture that supports a small Wasserstein GAN divergence. From the bound, we derive algorithms for hyperparameter selection and early stopping in cross-domain mapping GANs. We also demonstrate a novel capability of estimating confidence in the mapping of every specific sample. Lastly, we show how non-minimal architectures can be effectively trained by an inverted knowledge distillation in which a minimal architecture is used to train a larger one, leading to higher quality outputs.
Having a reliable accuracy score is crucial for real world applications of OCR, since such systems are judged by the number of false readings. Lexicon-based OCR systems, which deal with what is essentially a multi-class classification problem, often employ methods explicitly taking into account the lexicon, in order to improve accuracy. However, in lexicon-free scenarios, filtering errors requires an explicit confidence calculation. In this work we show two explicit confidence measurement techniques, and show that they are able to achieve a significant reduction in misreads on both standard benchmarks and a proprietary dataset.
We present a method for translating music across musical instruments, genres, and styles. This method is based on a multi-domain wavenet autoencoder, with a shared encoder and a disentangled latent space that is trained end-to-end on waveforms. Employing a diverse training dataset and large net capacity, the domain-independent encoder allows us to translate even from musical domains that were not seen during training. The method is unsupervised and does not rely on supervision in the form of matched samples between domains or musical transcriptions. We evaluate our method on NSynth, as well as on a dataset collected from professional musicians, and achieve convincing translations, even when translating from whistling, potentially enabling the creation of instrumental music by untrained humans.
Linking between two data sources is a basic building block in numerous computer vision problems. In this paper, we set to answer a fundamental cognitive question: are prior correspondences necessary for linking between different domains? One of the most popular methods for linking between domains is Canonical Correlation Analysis (CCA). All current CCA algorithms require correspondences between the views. We introduce a new method Unsupervised Correlation Analysis (UCA), which requires no prior correspondences between the two domains. The correlation maximization term in CCA is replaced by a combination of a reconstruction term (similar to autoencoders), full cycle loss, orthogonality and multiple domain confusion terms. Due to lack of supervision, the optimization leads to multiple alternative solutions with similar scores and we therefore introduce a consensus-based mechanism that is often able to recover the desired solution. Remarkably, this suffices in order to link remote domains such as text and images. We also present results on well accepted CCA benchmarks, showing that performance far exceeds other unsupervised baselines, and approaches supervised performance in some cases.