Large eCommerce players introduced comparison tables as a new type of recommendations. However, building comparisons at scale without pre-existing training/taxonomy data remains an open challenge, especially within the operational constraints of shops in the long tail. We present preliminary results from building a comparison pipeline designed to scale in a multi-shop scenario: we describe our design choices and run extensive benchmarks on multiple shops to stress-test it. Finally, we run a small user study on property selection and conclude by discussing potential improvements and highlighting the questions that remain to be addressed.
Recently, the European Commission published draft regulation for uniform procedures and technical specification for the type-approval of motor vehicles with an automated driving system (ADS). While the draft regulation is welcome progress for an industry ready to deploy life saving automated vehicle technology, we believe that the requirements can be further improved to enhance the safety and societal acceptance of automated vehicles (AVs). In this paper, we evaluate the draft regulation's performance requirements that would impact the Dynamic Driving Task (DDT). We highlight potential problems that can arise from the current proposed requirements and propose practical recommendations to improve the regulation.
Machine learning has the potential to fuel further advances in data science, but it is greatly hindered by an ad hoc design process, poor data hygiene, and a lack of statistical rigor in model evaluation. Recently, these issues have begun to attract more attention as they have caused public and embarrassing issues in research and development. Drawing from our experience as machine learning researchers, we follow the machine learning process from algorithm design to data collection to model evaluation, drawing attention to common pitfalls and providing practical recommendations for improvements. At each step, case studies are introduced to highlight how these pitfalls occur in practice, and where things could be improved.
Understanding the geographic distribution of species is a key concern in conservation. By pairing species occurrences with environmental features, researchers can model the relationship between an environment and the species which may be found there. To facilitate research in this area, we present the GeoLifeCLEF 2020 dataset, which consists of 1.9 million species observations paired with high-resolution remote sensing imagery, land cover data, and altitude, in addition to traditional low-resolution climate and soil variables. We also discuss the GeoLifeCLEF 2020 competition, which aims to use this dataset to advance the state-of-the-art in location-based species recommendation.
We consider the problem of decomposition of multiway tensor with binary entries. Such data problems arise frequently in numerous applications such as neuroimaging, recommendation system, topic modeling, and sensor network localization. We propose that the observed binary entries follow a Bernoulli model, develop a rank-constrained likelihood-based estimation procedure, and obtain the theoretical accuracy guarantees. Specifically, we establish the error bound of the tensor estimation, and show that the obtained rate is minimax optimal under the considered model. We demonstrate the efficacy of our approach through both simulations and analyses of multiple real-world datasets on the tasks of tensor completion and clustering.
We describe classical analogues to quantum algorithms for principal component analysis and nearest-centroid clustering. Given sampling assumptions, our classical algorithms run in time polylogarithmic in input, matching the runtime of the quantum algorithms with only polynomial slowdown. These algorithms are evidence that their corresponding problems do not yield exponential quantum speedups. To build our classical algorithms, we use the same techniques as applied in our previous work dequantizing a quantum recommendation systems algorithm. Thus, we provide further evidence for the strength of classical $\ell^2$-norm sampling assumptions when replacing quantum state preparation assumptions, in the machine learning domain.
This paper focuses on a newly developed transparent nADPCMB MLT speech coding algorithm. Our coder first decomposes the narrowband speech signal in subbands, a non linear ADPCM scheme is then performed in each subband. The signal subband decomposition is piloted by the equivalent Modulated Lapped Transform (MLT) filter bank. The novelty of this algorithm is the non linear approach, based on neural networks, to subband prediction coding. We have evaluated the performance of the nADPCMB MLT coding algorithm with a session of formal listening based on the five grade impairment scale standardized within ITU - T Recommendation P.800.
Graph neural networks have triggered a resurgence of graph-based text classification. We show that already a simple MLP baseline achieves comparable performance on benchmark datasets, questioning the importance of synthetic graph structures. When considering an inductive scenario, i. e., when adding new documents to a corpus, a simple MLP even outperforms the recent graph-based models TextGCN and HeteGCN and is comparable with HyperGAT. We further fine-tune DistilBERT and find that it outperforms all state-of-the-art models. We suggest that future studies use at least an MLP baseline to contextualize the results. We provide recommendations for the design and training of such a baseline.
In this article for the first time, comprehensive studies of mosquito neutralization using machine vision and a 1 W power laser are considered. Developed laser installation with Raspberry Pi that changing the direction of the laser with a galvanometer. We developed a program for mosquito tracking in real. The possibility of using deep neural networks, Haar cascades, machine learning for mosquito recognition was considered. We considered in detail the classification problems of mosquitoes in images. A recommendation is given for the implementation of this device based on a microcontroller for subsequent use as part of an unmanned aerial vehicle. Any harmful insects in the fields can be used as objects for control.
Contextual bandits algorithms have become essential in real-world user interaction problems in recent years. However, these algorithms rely on context as attribute value representation, which makes them unfeasible for real-world domains like social networks are inherently relational. We propose Relational Boosted Bandits(RB2), acontextual bandits algorithm for relational domains based on (relational) boosted trees. RB2 enables us to learn interpretable and explainable models due to the more descriptive nature of the relational representation. We empirically demonstrate the effectiveness and interpretability of RB2 on tasks such as link prediction, relational classification, and recommendations.