In the end-of-line test of geared motors, the evaluation of product qual-ity is important. Due to time constraints and the high diversity of variants, acous-tic measurements are more economical than vibration measurements. However, the acoustic data is affected by industrial disturbing noise. Therefore, the aim of this study is to investigate the robustness of features used for anomaly detection in geared motor end-of-line testing. A real-world dataset with typical faults and acoustic disturbances is recorded by an acoustic array. This includes industrial noise from the production and systematically produced disturbances, used to compare the robustness. Overall, it is proposed to apply features extracted from a log-envelope spectrum together with psychoacoustic features. The anomaly de-tection is done by using the isolation forest or the more universal bagging random miner. Most disturbances can be circumvented, while the use of a hammer or air pressure often causes problems. In general, these results are important for condi-tion monitoring tasks that are based on acoustic or vibration measurements. Fur-thermore, a real-world problem description is presented to improve common sig-nal processing and machine learning tasks.
We present a hierarchical framework based on graph search and model predictive control (MPC) for electric autonomous vehicle (EAV) parking maneuvers in a tight environment. At high-level, only static obstacles are considered, and the scenario-based hybrid A* (SHA*), which is faster than the traditional hybrid A*, is designed to provide an initial guess (also known as a global path) for the parking task. To extract the velocity and acceleration profile from an initial guess, an optimal control problem (OCP) is built. At the low level, an NMPC-based strategy is used to avoid dynamic obstacles (also known as local planning). The efficacy of SHA* is evaluated through 148 different simulation schemes and the proposed hierarchical parking framework is demonstrated through a real-time parallel parking simulation.
One of the main challenges in solving time-dependent partial differential equations is to develop computationally efficient solvers that are accurate and stable. Here, we introduce a graph neural network approach to finding efficient PDE solvers through learning using message-passing models. We first introduce domain invariant features for PDE-data inspired by classical PDE solvers for an efficient physical representation. Next, we use graphs to represent PDE-data on an unstructured mesh and show that message passing graph neural networks (MPGNN) can parameterize governing equations, and as a result, efficiently learn accurate solver schemes for linear/nonlinear PDEs. We further show that the solvers are independent of the initial trained geometry, i.e. the trained solver can find PDE solution on different complex domains. Lastly, we show that a recurrent graph neural network approach can find a temporal sequence of solutions to a PDE.
Yahoo Gemini native advertising marketplace serves billions of impressions daily, to hundreds millions of unique users, and reaches a yearly revenue of many hundreds of millions USDs. Powering Gemini native models for predicting advertise (ad) event probabilities, such as conversions and clicks, is OFFSET - a feature enhanced collaborative-filtering (CF) based event prediction algorithm. The predicted probabilities are then used in Gemini native auctions to determine which ads to present for every serving event (impression). Dynamic creative optimization (DCO) is a recent Gemini native product that was launched two years ago and is increasingly gaining more attention from advertisers. The DCO product enables advertisers to issue several assets per each native ad attribute, creating multiple combinations for each DCO ad. Since different combinations may appeal to different crowds, it may be beneficial to present certain combinations more frequently than others to maximize revenue while keeping advertisers and users satisfied. The initial DCO offer was to optimize click-through rates (CTR), however as the marketplace shifts more towards conversion based campaigns, advertisers also ask for a {conversion based solution. To accommodate this request, we present a post-auction solution, where DCO ads combinations are favored according to their predicted conversion rate (CVR). The predictions are provided by an auxiliary OFFSET based combination CVR prediction model, and used to generate the combination distributions for DCO ad rendering during serving time. An online evaluation of this explore-exploit solution, via online bucket A/B testing, serving Gemini native DCO traffic, showed a 53.5% CVR lift, when compared to a control bucket serving all combinations uniformly at random.
In this work, we focus on detecting emergency vehicles using only audio data. Improved and quick detection can help in faster preemption of these vehicles at signalized intersections thereby reducing overall response time in case of emergencies. Important audio features were extracted from raw data and passed into extreme learning machines (ELM) for training. ELMs have been used in this work because of its simplicity and shorter run-time which can therefore be used for online learning. Recently, there have been many studies that focus on sound classification but most of the methods used are complex to train and implement. The results from this paper show that ELM can achieve similar performance with exceptionally shorter training times. The accuracy reported for ELM is about 97% for emergency vehicle detection (EVD).
Self-supervised models trained with a contrastive loss such as CLIP have shown to be very powerful in zero-shot classification settings. However, to be used as a zero-shot classifier these models require the user to provide new captions over a fixed set of labels at test time. In many settings, it is hard or impossible to know if a new query caption is compatible with the source captions used to train the model. We address these limitations by framing the zero-shot classification task as an outlier detection problem and develop a conformal prediction procedure to assess when a given test caption may be reliably used. On a real-world medical example, we show that our proposed conformal procedure improves the reliability of CLIP-style models in the zero-shot classification setting, and we provide an empirical analysis of the factors that may affect its performance.
In recent years, deep learning approaches for partial differential equations have received much attention due to their mesh-freeness and other desirable properties. However, most of the works so far concentrated on time-dependent nonlinear differential equations. In this work, we analyze potential issues with the well-known Physic Informed Neural Network for differential equations that are not time-dependent. This analysis motivates us to introduce a novel technique, namely FinNet, for solving differential equations by incorporating finite difference into deep learning. Even though we use a mesh during the training phase, the prediction phase is mesh-free. We illustrate the effectiveness of our method through experiments on solving various equations.
A proper parametrization of state transition matrices of linear state-space models (SSMs) followed by standard nonlinearities enables them to efficiently learn representations from sequential data, establishing the state-of-the-art on a large series of long-range sequence modeling benchmarks. In this paper, we show that we can improve further when the structural SSM such as S4 is given by a linear liquid time-constant (LTC) state-space model. LTC neural networks are causal continuous-time neural networks with an input-dependent state transition module, which makes them learn to adapt to incoming inputs at inference. We show that by using a diagonal plus low-rank decomposition of the state transition matrix introduced in S4, and a few simplifications, the LTC-based structural state-space model, dubbed Liquid-S4, achieves the new state-of-the-art generalization across sequence modeling tasks with long-term dependencies such as image, text, audio, and medical time-series, with an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4. The additional gain in performance is the direct result of the Liquid-S4's kernel structure that takes into account the similarities of the input sequence samples during training and inference.
Maintaining the identity of multiple objects in real-time video is a challenging task, as it is not always possible to run a detector on every frame. Thus, motion estimation systems are often employed, which either do not scale well with the number of targets or produce features with limited semantic information. To solve the aforementioned problems and allow the tracking of dozens of arbitrary objects in real-time, we propose SiamMOTION. SiamMOTION includes a novel proposal engine that produces quality features through an attention mechanism and a region-of-interest extractor fed by an inertia module and powered by a feature pyramid network. Finally, the extracted tensors enter a comparison head that efficiently matches pairs of exemplars and search areas, generating quality predictions via a pairwise depthwise region proposal network and a multi-object penalization module. SiamMOTION has been validated on five public benchmarks, achieving leading performance against current state-of-the-art trackers.
With climate change predicted to increase the likelihood of landslide events, there is a growing need for rapid landslide detection technologies that help inform emergency responses. Synthetic Aperture Radar (SAR) is a remote sensing technique that can provide measurements of affected areas independent of weather or lighting conditions. Usage of SAR, however, is hindered by domain knowledge that is necessary for the pre-processing steps and its interpretation requires expert knowledge. We provide simplified, pre-processed, machine-learning ready SAR datacubes for four globally located landslide events obtained from several Sentinel-1 satellite passes before and after a landslide triggering event together with segmentation maps of the landslides. From this dataset, using the Hokkaido, Japan datacube, we study the feasibility of SAR-based landslide detection with supervised deep learning (DL). Our results demonstrate that DL models can be used to detect landslides from SAR data, achieving an Area under the Precision-Recall curve exceeding 0.7. We find that additional satellite visits enhance detection performance, but that early detection is possible when SAR data is combined with terrain information from a digital elevation model. This can be especially useful for time-critical emergency interventions. Code is made publicly available at https://github.com/iprapas/landslide-sar-unet.