The efficiency of natural language processing has improved dramatically with the advent of machine learning models, particularly neural network-based solutions. However, some tasks are still challenging, especially when considering specific domains. In this paper, we present a cloud-based system that can extract insights from customer reviews using machine learning methods integrated into a pipeline. For topic modeling, our composite model uses transformer-based neural networks designed for natural language processing, vector embedding-based keyword extraction, and clustering. The elements of our model have been integrated and further developed to meet better the requirements of efficient information extraction, topic modeling of the extracted information, and user needs. Furthermore, our system can achieve better results than this task's existing topic modeling and keyword extraction solutions. Our approach is validated and compared with other state-of-the-art methods using publicly available datasets for benchmarking.
Ensemble-based methods are highly popular approaches that increase the accuracy of a decision by aggregating the opinions of individual voters. The common point is to maximize accuracy; however, a natural limitation occurs if incremental costs are also assigned to the individual voters. Consequently, we investigate creating ensembles under an additional constraint on the total cost of the members. This task can be formulated as a knapsack problem, where the energy is the ensemble accuracy formed by some aggregation rules. However, the generally applied aggregation rules lead to a nonseparable energy function, which takes the common solution tools -- such as dynamic programming -- out of action. We introduce a novel stochastic approach that considers the energy as the joint probability function of the member accuracies. This type of knowledge can be efficiently incorporated in a stochastic search process as a stopping rule, since we have the information on the expected accuracy or, alternatively, the probability of finding more accurate ensembles. Experimental analyses of the created ensembles of pattern classifiers and object detectors confirm the efficiency of our approach. Moreover, we propose a novel stochastic search strategy that better fits the energy, compared with general approaches such as simulated annealing.
Ensemble-based approaches are very effective in various fields in raising the accuracy of its individual members, when some voting rule is applied for aggregating the individual decisions. In this paper, we investigate how to find and characterize the ensembles having the highest accuracy if the total cost of the ensemble members is bounded. This question leads to Knapsack problem with non-linear and non-separable objective function in binary and multiclass classification if the majority voting is chosen for the aggregation. As the conventional solving methods cannot be applied for this task, a novel stochastic approach was introduced in the binary case where the energy function is discussed as the joint probability function of the member accuracy. We show some theoretical results with respect to the expected ensemble accuracy and its variance in the multiclass classification problem which can help us to solve the Knapsack problem.