Automated crop-type classification using Sentinel-2 satellite time series is essential to support agriculture monitoring. Recently, deep learning models based on transformer encoders became a promising approach for crop-type classification. Using explainable machine learning to reveal the inner workings of these models is an important step towards improving stakeholders' trust and efficient agriculture monitoring. In this paper, we introduce a novel explainability framework that aims to shed a light on the essential crop disambiguation patterns learned by a state-of-the-art transformer encoder model. More specifically, we process the attention weights of a trained transformer encoder to reveal the critical dates for crop disambiguation and use domain knowledge to uncover the phenological events that support the model performance. We also present a sensitivity analysis approach to understand better the attention capability for revealing crop-specific phenological events. We report compelling results showing that attention patterns strongly relate to key dates, and consequently, to the critical phenological events for crop-type classification. These findings might be relevant for improving stakeholder trust and optimizing agriculture monitoring processes. Additionally, our sensitivity analysis demonstrates the limitation of attention weights for identifying the important events in the crop phenology as we empirically show that the unveiled phenological events depend on the other crops in the data considered during training.
An intuitive control method for the flying trot, which combines offline trajectory planning with real-time balance control, is presented. The motion features of running animals in the vertical direction were analysed using the spring-load-inverted-pendulum (SLIP) model, and the foot trajectory of the robot was planned, so the robot could run similar to an animal capable of vertical flight, according to the given height and speed of the trunk. To improve the robustness of running, a posture control method based on a foot acceleration adjustment is proposed. A novel kinematic based CoM observation method and CoM regulation method is present to enhance the stability of locomotion. To reduce the impact force when the robot interacts with the environment, the virtual model control method is used in the control of the foot trajectory to achieve active compliance. By selecting the proper parameters for the virtual model, the oscillation motion of the virtual model and the planning motion of the support foot are synchronized to avoid the large disturbance caused by the oscillation motion of the virtual model in relation to the robot motion. The simulation and experiment using the quadruped robot Billy are reported. In the experiment, the maximum speed of the robot could reach 4.73 times the body length per second, which verified the feasibility of the control method.
Correlated time series (CTS) forecasting plays an essential role in many cyber-physical systems, where multiple sensors emit time series that capture interconnected processes. Solutions based on deep learning that deliver state-of-the-art CTS forecasting performance employ a variety of spatio-temporal (ST) blocks that are able to model temporal dependencies and spatial correlations among time series. However, two challenges remain. First, ST-blocks are designed manually, which is time consuming and costly. Second, existing forecasting models simply stack the same ST-blocks multiple times, which limits the model potential. To address these challenges, we propose AutoCTS that is able to automatically identify highly competitive ST-blocks as well as forecasting models with heterogeneous ST-blocks connected using diverse topologies, as opposed to the same ST-blocks connected using simple stacking. Specifically, we design both a micro and a macro search space to model possible architectures of ST-blocks and the connections among heterogeneous ST-blocks, and we provide a search strategy that is able to jointly explore the search spaces to identify optimal forecasting models. Extensive experiments on eight commonly used CTS forecasting benchmark datasets justify our design choices and demonstrate that AutoCTS is capable of automatically discovering forecasting models that outperform state-of-the-art human-designed models. This is an extended version of ``AutoCTS: Automated Correlated Time Series Forecasting'', to appear in PVLDB 2022.
Human fixation patterns have been shown to correlate strongly with Transformer-based attention. Those correlation analyses are usually carried out without taking into account individual differences between participants and are mostly done on monolingual datasets making it difficult to generalise findings. In this paper, we analyse eye-tracking data from speakers of 13 different languages reading both in their native language (L1) and in English as language learners (L2). We find considerable differences between languages but also that individual reading behaviour such as skipping rate, total reading time and vocabulary knowledge (LexTALE) influence the alignment between humans and models to an extent that should be considered in future studies.
Confidence intervals are a standard technique for analyzing data. When applied to time series, confidence intervals are computed for each time point separately. Alternatively, we can compute confidence bands, where we are required to find the smallest area enveloping $k$ time series, where $k$ is a user parameter. Confidence bands can be then used to detect abnormal time series, not just individual observations within the time series. We will show that despite being an NP-hard problem it is possible to find optimal confidence band for some $k$. We do this by considering a different problem: discovering regularized bands, where we minimize the envelope area minus the number of included time series weighted by a parameter $\alpha$. Unlike normal confidence bands we can solve the problem exactly by using a minimum cut. By varying $\alpha$ we can obtain solutions for various $k$. If we have a constraint $k$ for which we cannot find appropriate $\alpha$, we demonstrate a simple algorithm that yields $O(\sqrt{n})$ approximation guarantee by connecting the problem to a minimum $k$-union problem. This connection also implies that we cannot approximate the problem better than $O(n^{1/4})$ under some (mild) assumptions. Finally, we consider a variant where instead of minimizing the area we minimize the maximum width. Here, we demonstrate a simple 2-approximation algorithm and show that we cannot achieve better approximation guarantee.
Deep Ensembles, as a type of Bayesian Neural Networks, can be used to estimate uncertainty on the prediction of multiple neural networks by collecting votes from each network and computing the difference in those predictions. In this paper, we introduce a novel method for uncertainty estimation called Layer Ensembles that considers a set of independent categorical distributions for each layer of the network, giving many more possible samples with overlapped layers, than in the regular Deep Ensembles. We further introduce Optimized Layer Ensembles with an inference procedure that reuses common layer outputs, achieving up to 19x speed up and quadratically reducing memory usage. We also show that Layer Ensembles can be further improved by ranking samples, resulting in models that require less memory and time to run while achieving higher uncertainty quality than Deep Ensembles.
Smart Repeaters (SR) can potentially enhance the coverage in Millimeter-wave (mmWave) wireless communications. However, the angular coverage of the existing two-panel SR is too limited to make the SR a truly cost-effective mmWave range extender. This paper proposes the usage of a tri-sectoral Advanced SR (ASR) to extend the angular coverage with respect to conventional SR. We propose a multi-user precoder optimization for ASR in a downlink multi-carrier communication system to maximize the number of served User Equipments (UEs) while guaranteeing constraints on per-UE rate and time-frequency resources. Numerical results show the benefits of the ASR against conventional SR in terms of both cumulative spectral efficiency and number of served UEs (both improved by an average factor 2), varying the system parameters.
Hyperparameter tuning is a fundamental aspect of machine learning research. Setting up the infrastructure for systematic optimization of hyperparameters can take a significant amount of time. Here, we present PyHopper, a black-box optimization platform designed to streamline the hyperparameter tuning workflow of machine learning researchers. PyHopper's goal is to integrate with existing code with minimal effort and run the optimization process with minimal necessary manual oversight. With simplicity as the primary theme, PyHopper is powered by a single robust Markov-chain Monte-Carlo optimization algorithm that scales to millions of dimensions. Compared to existing tuning packages, focusing on a single algorithm frees the user from having to decide between several algorithms and makes PyHopper easily customizable. PyHopper is publicly available under the Apache-2.0 license at https://github.com/PyHopper/PyHopper.
Recently, neural tangent kernel (NTK) has been used to explain the dynamics of learning parameters of neural networks, at the large width limit. Quantitative analyses of NTK give rise to network widths that are often impractical and incur high costs in time and energy in both training and deployment. Using a matrix factorization technique, we show how to obtain similar guarantees to those obtained by a prior analysis while reducing training and inference resource costs. The importance of our result further increases when the input points' data dimension is in the same order as the number of input points. More generally, our work suggests how to analyze large width networks in which dense linear layers are replaced with a low complexity factorization, thus reducing the heavy dependence on the large width.
The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. To mitigate this domain shift problem, domain adaptation (DA) techniques search for an optimal transformation that converts the (current) input data from a source domain to a target domain to learn a domain-invariant representations that reduces domain discrepancy. This paper proposes a novel supervised domain adaptation based on two steps. First, we search for an optimal class-dependent transformation from the source to the target domain from a few samples. We consider optimal transport methods such as the earth mover distance with Laplacian regularization, Sinkhorn transport and correlation alignment. Second, we use embedding similarity techniques to select the corresponding transformation at inference. We use correlation metrics and maximum mean discrepancy with higher-order moment matching techniques. We conduct an extensive evaluation on time-series datasets with domain shift including simulated and various online handwriting datasets to demonstrate the performance.