In this article we address the question whether it is possible to learn the differential equations describing the physical properties of a dynamical system, subject to non-conservative forces, from observations of its realspace trajectory(ies) only. We introduce a network that incorporates a difference approximation for the second order derivative in terms of residual connections between convolutional blocks, whose shared weights represent the coefficients of a second order ordinary differential equation. We further combine this solver-like architecture with a convolutional network, capable of learning the relation between trajectories of coupled oscillators and therefore allows us to make a stable forecast even if the system is only partially observed. We optimize this map together with the solver network, while sharing their weights, to form a powerful framework capable of learning the complex physical properties of a dissipative dynamical system.
Recent advancements in language representation models such as BERT have led to a rapid improvement in numerous natural language processing tasks. However, language models usually consist of a few hundred million trainable parameters with embedding space distributed across multiple layers, thus making them challenging to be fine-tuned for a specific task or to be transferred to a new domain. To determine whether there are task-specific neurons that can be exploited for unsupervised transfer learning, we introduce a method for selecting the most important neurons to solve a specific classification task. This algorithm is further extended to multi-source transfer learning by computing the importance of neurons for several single-source transfer learning scenarios between different subsets of data sources. Besides, a task-specific fingerprint for each data source is obtained based on the percentage of the selected neurons in each layer. We perform extensive experiments in unsupervised transfer learning for sentiment analysis, natural language inference and sentence similarity, and compare our results with the existing literature and baselines. Significantly, we found that the source and target data sources with higher degrees of similarity between their task-specific fingerprints demonstrate a better transferability property. We conclude that our method can lead to better performance using just a few hundred task-specific and interpretable neurons.