Probabilistic load forecasting (PLF) is a key component in the extended tool-chain required for efficient management of smart energy grids. Neural networks are widely considered to achieve improved prediction performances, supporting highly flexible mappings of complex relationships between the target and the conditioning variables set. However, obtaining comprehensive predictive uncertainties from such black-box models is still a challenging and unsolved problem. In this work, we propose a novel PLF approach, framed on Bayesian Mixture Density Networks. Both aleatoric and epistemic uncertainty sources are encompassed within the model predictions, inferring general conditional densities, depending on the input features, within an end-to-end training framework. To achieve reliable and computationally scalable estimators of the posterior distributions, both Mean Field variational inference and deep ensembles are integrated. Experiments have been performed on household short-term load forecasting tasks, showing the capability of the proposed method to achieve robust performances in different operating conditions.
Skeleton-based human action recognition has achieved a great interest in recent years, as skeleton data has been demonstrated to be robust to illumination changes, body scales, dynamic camera views, and complex background. Nevertheless, an effective encoding of the latent information underlying the 3D skeleton is still an open problem. In this work, we propose a novel Spatial-Temporal Transformer network (ST-TR) which models dependencies between joints using the Transformer self-attention operator. In our ST-TR model, a Spatial Self-Attention module (SSA) is used to understand intra-frame interactions between different body parts, and a Temporal Self-Attention module (TSA) to model inter-frame correlations. The two are combined in a two-stream network which outperforms state-of-the-art models using the same input data on both NTU-RGB+D 60 and NTU-RGB+D 120.
Mesh refinement is a fundamental step for accurate Multi-View Stereo. It modifies the geometry of an initial manifold mesh to minimize the photometric error induced in a set of camera pairs. This initial mesh is usually the output of volumetric 3D reconstruction based on min-cut over Delaunay Triangulations. Such methods produce a significant amount of non-manifold vertices, therefore they require a vertex split step to explicitly repair them. In this paper, we extend this method to preemptively fix the non-manifold vertices by reasoning directly on the Delaunay Triangulation and avoid most vertex splits. The main contribution of this paper addresses the problem of choosing the camera pairs adopted by the refinement process. We treat the problem as a mesh labeling process, where each label corresponds to a camera pair. Differently from the state-of-the-art methods, which use each camera pair to refine all the visible parts of the mesh, we choose, for each facet, the best pair that enforces both the overall visibility and coverage. The refinement step is applied for each facet using only the camera pair selected. This facetwise refinement helps the process to be applied in the most evenly way possible.
This work targets the identification of a class of models for hybrid dynamical systems characterized by nonlinear autoregressive exogenous (NARX) components, with finite-dimensional polynomial expansions, and by a Markovian switching mechanism. The estimation of the model parameters is performed under a probabilistic framework via Expectation Maximization, including submodel coefficients, hidden state values and transition probabilities. Discrete mode classification and NARX regression tasks are disentangled within the iterations. Soft-labels are assigned to latent states on the trajectories by averaging over the state posteriors and updated using the parametrization obtained from the previous maximization phase. Then, NARXs parameters are repeatedly fitted by solving weighted regression subproblems through a cyclical coordinate descent approach with coordinate-wise minimization. Moreover, we investigate a two stage selection scheme, based on a l1-norm bridge estimation followed by hard-thresholding, to achieve parsimonious models through selection of the polynomial expansion. The proposed approach is demonstrated on a SMNARX problem composed by three nonlinear sub-models with specific regressors.
Hybrid system identification is a key tool to achieve reliable models of Cyber-Physical Systems from data. PieceWise Affine models guarantees universal approximation, local linearity and equivalence to other classes of hybrid system. Still, PWA identification is a challenging problem, requiring the concurrent solution of regression and classification tasks. In this work, we focus on the identification of PieceWise Auto Regressive with eXogenous input models with arbitrary regions (NPWARX), thus not restricted to polyhedral domains, and characterized by discontinuous maps. To this end, we propose a method based on a probabilistic mixture model, where the discrete state is represented through a multinomial distribution conditioned by the input regressors. The architecture is conceived following the Mixture of Expert concept, developed within the machine learning field. To achieve nonlinear partitioning, we parametrize the discriminant function using a neural network. Then, the parameters of both the ARX submodels and the classifier are concurrently estimated by maximizing the likelihood of the overall model using Expectation Maximization. The proposed method is demonstrated on a nonlinear piece-wise problem with discontinuous maps.
The ability of autonomous vehicles to maintain an accurate trajectory within their road lane is crucial for safe operation. This requires detecting the road lines and estimating the car relative pose within its lane. Lateral lines are usually computed from camera images. Still, most of the works on line detection are limited to image mask retrieval and do not provide a usable representation in world coordinates. What we propose in this paper is a complete perception pipeline able to retrieve, from a single image, all the information required by a vehicle lateral control system: road lines equation, centerline, vehicle heading and lateral displacement. We also evaluate our system by acquiring a new dataset with accurate geometric ground truth, and we make it publicly available to act as a benchmark for further research.
Dynamic Vision Sensors (DVSs) asynchronously stream events in correspondence of pixels subject to brightness changes. Differently from classic vision devices, they produce a sparse representation of the scene. Therefore, to apply standard computer vision algorithms, events need to be integrated into a frame or event-surface. This is usually attained through hand-crafted grids that reconstruct the frame using ad-hoc heuristics. In this paper, we propose Matrix-LSTM, a grid of Long Short-Term Memory (LSTM) cells to learn end-to-end a task-dependent event-surfaces. Compared to existing reconstruction approaches, our learned event-surface shows good flexibility and expressiveness improving the baselines on optical flow estimation on the MVSEC benchmark and the state-of-the-art of event-based object classification on the N-Cars dataset.
While many quality metrics exist to evaluate the quality of a grasp by itself, no clear quantification of the quality of a grasp relatively to the task the grasp is used for has been defined yet. In this paper we propose a framework to extend the concept of grasp quality metric to task-oriented grasping by defining affordance functions via basic grasp metrics for an open set of task affordances. We evaluate both the effectivity of the proposed task oriented metrics and their practical applicability by learning to infer them from vision. Indeed, we assess the validity of our novel framework both in the context of perfect information, i.e., known object model, and in the partial information context, i.e., inferring task oriented metrics from vision, underlining advantages and limitations of both situations. In the former, physical metrics of grasp hypotheses on an object are defined and computed in known object model simulation, in the latter deep models are trained to infer such properties from partial information in the form of synthesized range images.