Abstract:Kernel methods are powerful tools for nonlinear learning with well-established theory. The scalability issue has been their long-standing challenge. Despite the existing success, there are two limitations in large-scale kernel methods: (i) The memory overhead is too high for users to afford; (ii) existing efforts mainly focus on kernel ridge regression (KRR), while other models lack study. In this paper, we propose Joker, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines. We design a dual block coordinate descent method with trust region (DBCD-TR) and adopt kernel approximation with randomized features, leading to low memory costs and high efficiency in large-scale learning. Experiments show that Joker saves up to 90\% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods.
Abstract:This paper focuses on a specific family of classifiers called nonparallel support vector classifiers (NPSVCs). Different from typical classifiers, the training of an NPSVC involves the minimization of multiple objectives, resulting in the potential concerns of feature suboptimality and class dependency. Consequently, no effective learning scheme has been established to improve NPSVCs' performance through representation learning, especially deep learning. To break this bottleneck, we develop NPSVC++ based on multi-objective optimization, enabling the end-to-end learning of NPSVC and its features. By pursuing Pareto optimality, NPSVC++ theoretically ensures feature optimality across classes, hence effectively overcoming the two issues above. A general learning procedure via duality optimization is proposed, based on which we provide two applicable instances, K-NPSVC++ and D-NPSVC++. The experiments show their superiority over the existing methods and verify the efficacy of NPSVC++.