This paper evaluates XGboost's performance given different dataset sizes and class distributions, from perfectly balanced to highly imbalanced. XGBoost has been selected for evaluation, as it stands out in several benchmarks due to its detection performance and speed. After introducing the problem of fraud detection, the paper reviews evaluation metrics for detection systems or binary classifiers, and illustrates with examples how different metrics work for balanced and imbalanced datasets. Then, it examines the principles of XGBoost. It proposes a pipeline for data preparation and compares a Vanilla XGBoost against a random search-tuned XGBoost. Random search fine-tuning provides consistent improvement for large datasets of 100 thousand samples, not so for medium and small datasets of 10 and 1 thousand samples, respectively. Besides, as expected, XGBoost recognition performance improves as more data is available, and deteriorates detection performance as the datasets become more imbalanced. Tests on distributions with 50, 45, 25, and 5 percent positive samples show that the largest drop in detection performance occurs for the distribution with only 5 percent positive samples. Sampling to balance the training set does not provide consistent improvement. Therefore, future work will include a systematic study of different techniques to deal with data imbalance and evaluating other approaches, including graphs, autoencoders, and generative adversarial methods, to deal with the lack of labels.
This paper presents a method for time series forecasting with deep learning and its assessment on two datasets. The method starts with data preparation, followed by model training and evaluation. The final step is a visual inspection. Experimental work demonstrates that a single time series can be used to train deep learning networks if time series in a dataset contain patterns that repeat even with a certain variation. However, for less structured time series such as stock market closing prices, the networks perform just like a baseline that repeats the last observed value. The implementation of the method as well as the experiments are open-source.
We evaluated a lightweight Convolutional Neural Network (CNN) called LPRNet  for automatic License Plate Recognition (LPR). We evaluated the algorithm on two datasets, one composed of real license plate images and the other of synthetic license plate images. In addition, we compared its performance against Tesseract , an Optical Character Recognition engine. We measured performance based on recognition accuracy and Levenshtein Distance. LPRNet is an end-to-end framework and demonstrated robust performance on both datasets, delivering 90 and 89 percent recognition accuracy on test sets of 1000 real and synthetic license plate images, respectively. Tesseract was not trained using real license plate images and performed well only on the synthetic dataset after pre-processing steps delivering 93 percent recognition accuracy. Finally, Pareto analysis for frequency analysis of misclassified characters allowed us to find in detail which characters were the most conflicting ones according to the percentage of accumulated error. Depending on the region, license plate images possess particular characteristics. Once properly trained, LPRNet can be used to recognize characters from a specific region and dataset. Future work can focus on applying transfer learning to utilize the features learned by LPRNet and fine-tune it given a smaller, newer dataset of license plates.
What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR 2021 NeurIPS challenge is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR 2021 evaluates audio representations using a benchmark suite across a variety of domains, including speech, environmental sound, and music. In the spirit of shared exchange, each participant submitted an audio embedding model following a common API that is general-purpose, open-source, and freely available to use. Twenty-nine models by thirteen external teams were evaluated on nineteen diverse downstream tasks derived from sixteen datasets. Open evaluation code, submitted models and datasets are key contributions, enabling comprehensive and reproducible evaluation, as well as previously impossible longitudinal studies. It still remains an open question whether one single general-purpose audio representation can perform as holistically as the human ear.