Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuichi Takano

Nonlinear Data Integration via Kernel Methods for Data Collaboration Analysis

May 26, 2026

Yamato Suetake, Yuta Kawakami, Shunnosuke Ikeda, Yuichi Takano

Abstract:Collaborative analysis of decentralized confidential datasets is important, but direct sharing of original datasets is often restricted by privacy and institutional constraints. Data collaboration (DC) analysis transforms each dataset into privacy-preserving intermediate representations via party-specific obfuscation functions and integrates them into common collaboration representations using an anchor dataset. However, many existing DC analysis methods rely on linear transformations for data obfuscation and integration, which may increase reconstruction risk. Although nonlinear dimensionality reduction can mitigate this risk, conventional linear integration methods cannot accurately align intermediate representations produced by nonlinear transformations. Moreover, existing integration methods mainly minimize discrepancies among parties and do not explicitly incorporate geometric or target-variable information useful for downstream analysis. To overcome these limitations, we first formulate linear kernel integration (LKI) as a linear integration method and then kernelize it to obtain nonlinear kernel integration (NKI). NKI admits a globally optimal solution via kernel ridge regression and an eigenvalue problem. We also introduce graph regularization and a centering constraint so that the target representation can capture geometric and target-variable information useful for downstream analysis. Experiments on image classification tasks demonstrate that NKI improves classification accuracy over existing linear integration methods under nonlinear dimensionality reduction, with further gains from target-variable-aware graph regularization and centering. The results also show that dimensionality reduction choices substantially affect both classification accuracy and reconstruction risk.

* 50 pages, 7 figures

Via

Access Paper or Ask Questions

Interpretable clustering via optimal multiway-split decision trees

Feb 14, 2026

Hayato Suzuki, Shunnosuke Ikeda, Yuichi Takano

Abstract:Clustering serves as a vital tool for uncovering latent data structures, and achieving both high accuracy and interpretability is essential. To this end, existing methods typically construct binary decision trees by solving mixed-integer nonlinear optimization problems, often leading to significant computational costs and suboptimal solutions. Furthermore, binary decision trees frequently result in excessively deep structures, which makes them difficult to interpret. To mitigate these issues, we propose an interpretable clustering method based on optimal multiway-split decision trees, formulated as a 0-1 integer linear optimization problem. This reformulation renders the optimization problem more tractable compared to existing models. A key feature of our method is the integration of a one-dimensional K-means algorithm for the discretization of continuous variables, allowing for flexible and data-driven branching. Extensive numerical experiments on publicly available real-world datasets demonstrate that our method outperforms baseline methods in terms of clustering accuracy and interpretability. Our method yields multiway-split decision trees with concise decision rules while maintaining competitive performance across various evaluation metrics.

Via

Access Paper or Ask Questions

Subset Selection for Stratified Sampling in Online Controlled Experiments

Sep 19, 2025

Haru Momozu, Yuki Uehara, Naoki Nishimura, Koya Ohashi, Deddy Jobson, Yilin Li, Phuong Dinh, Noriyoshi Sukegawa, Yuichi Takano

Figure 1 for Subset Selection for Stratified Sampling in Online Controlled Experiments

Figure 2 for Subset Selection for Stratified Sampling in Online Controlled Experiments

Figure 3 for Subset Selection for Stratified Sampling in Online Controlled Experiments

Figure 4 for Subset Selection for Stratified Sampling in Online Controlled Experiments

Abstract:Online controlled experiments, also known as A/B testing, are the digital equivalent of randomized controlled trials for estimating the impact of marketing campaigns on website visitors. Stratified sampling is a traditional technique for variance reduction to improve the sensitivity (or statistical power) of controlled experiments; this technique first divides the population into strata (homogeneous subgroups) based on stratification variables and then draws samples from each stratum to avoid sampling bias. To enhance the estimation accuracy of stratified sampling, we focus on the problem of selecting a subset of stratification variables that are effective in variance reduction. We design an efficient algorithm that selects stratification variables one by one by simulating a series of stratified sampling processes. We also estimate the computational complexity of our subset selection algorithm. Computational experiments using synthetic and real-world datasets demonstrate that our method can outperform other variance reduction techniques especially when multiple variables have a certain correlation with the outcome variable. Our subset selection method for stratified sampling can improve the sensitivity of online controlled experiments, thus enabling more reliable marketing decisions.

* 14 pages, 15 figures, The 22nd Pacific Rim International Conference on Artificial Intelligence 2025 (PRICAI 2025)

Via

Access Paper or Ask Questions

DC Algorithm for Estimation of Sparse Gaussian Graphical Models

Aug 08, 2024

Tomokaze Shiratori, Yuichi Takano

Figure 1 for DC Algorithm for Estimation of Sparse Gaussian Graphical Models

Figure 2 for DC Algorithm for Estimation of Sparse Gaussian Graphical Models

Figure 3 for DC Algorithm for Estimation of Sparse Gaussian Graphical Models

Figure 4 for DC Algorithm for Estimation of Sparse Gaussian Graphical Models

Abstract:Sparse estimation for Gaussian graphical models is a crucial technique for making the relationships among numerous observed variables more interpretable and quantifiable. Various methods have been proposed, including graphical lasso, which utilizes the $\ell_1$ norm as a regularization term, as well as methods employing non-convex regularization terms. However, most of these methods approximate the $\ell_0$ norm with convex functions. To estimate more accurate solutions, it is desirable to treat the $\ell_0$ norm directly as a regularization term. In this study, we formulate the sparse estimation problem for Gaussian graphical models using the $\ell_0$ norm and propose a method to solve this problem using the Difference of Convex functions Algorithm (DCA). Specifically, we convert the $\ell_0$ norm constraint into an equivalent largest-$K$ norm constraint, reformulate the constrained problem into a penalized form, and solve it using the DC algorithm (DCA). Furthermore, we designed an algorithm that efficiently computes using graphical lasso. Experimental results with synthetic data show that our method yields results that are equivalent to or better than existing methods. Comparisons of model learning through cross-validation confirm that our method is particularly advantageous in selecting true edges.

Via

Access Paper or Ask Questions

Robust personalized pricing under uncertainty of purchase probabilities

Jul 22, 2024

Shunnosuke Ikeda, Naoki Nishimura, Noriyoshi Sukegawa, Yuichi Takano

Figure 1 for Robust personalized pricing under uncertainty of purchase probabilities

Figure 2 for Robust personalized pricing under uncertainty of purchase probabilities

Figure 3 for Robust personalized pricing under uncertainty of purchase probabilities

Figure 4 for Robust personalized pricing under uncertainty of purchase probabilities

Abstract:This paper is concerned with personalized pricing models aimed at maximizing the expected revenues or profits for a single item. While it is essential for personalized pricing to predict the purchase probabilities for each consumer, these predicted values are inherently subject to unavoidable errors that can negatively impact the realized revenues and profits. To address this issue, we focus on robust optimization techniques that yield reliable solutions to optimization problems under uncertainty. Specifically, we propose a robust optimization model for personalized pricing that accounts for the uncertainty of predicted purchase probabilities. This model can be formulated as a mixed-integer linear optimization problem, which can be solved exactly using mathematical optimization solvers. We also develop a Lagrangian decomposition algorithm combined with line search to efficiently find high-quality solutions for large-scale optimization problems. Experimental results demonstrate the effectiveness of our robust optimization model and highlight the utility of our Lagrangian decomposition algorithm in terms of both computational efficiency and solution quality.

Via

Access Paper or Ask Questions

Strategic Coupon Allocation for Increasing Providers' Sales Experiences in Two-sided Marketplaces

Jul 20, 2024

Koya Ohashi, Sho Sekine, Deddy Jobson, Jie Yang, Naoki Nishimura, Noriyoshi Sukegawa, Yuichi Takano

Figure 1 for Strategic Coupon Allocation for Increasing Providers' Sales Experiences in Two-sided Marketplaces

Figure 2 for Strategic Coupon Allocation for Increasing Providers' Sales Experiences in Two-sided Marketplaces

Figure 3 for Strategic Coupon Allocation for Increasing Providers' Sales Experiences in Two-sided Marketplaces

Figure 4 for Strategic Coupon Allocation for Increasing Providers' Sales Experiences in Two-sided Marketplaces

Abstract:In a two-sided marketplace, network effects are crucial for competitiveness, and platforms need to retain users through advanced customer relationship management as much as possible. Maintaining numerous providers' stable and active presence on the platform is highly important to enhance the marketplace's scale and diversity. The strongest motivation for providers to continue using the platform is to realize actual profits through sales. Then, we propose a personalized promotion to increase the number of successful providers with sales experiences on the platform. The main contributions of our research are twofold. First, we introduce a new perspective in provider management with the distribution of successful sales experiences. Second, we propose a personalized promotion optimization method to maximize the number of providers' sales experiences. By utilizing this approach, we ensure equal opportunities for providers to experience sales without being monopolized by a few providers. Through experiments using actual data on coupon distribution, we confirm that our method enables the implementation of coupon allocation strategies that significantly increase the total number of providers having sales experiences.

* 8 pages, 10 figures, KDD 2024 Workshop on Two-sided Marketplace Optimization: Search, Pricing, Matching & Growth

Via

Access Paper or Ask Questions

Fast solution to the fair ranking problem using the Sinkhorn algorithm

Jun 11, 2024

Yuki Uehara, Shunnosuke Ikeda, Naoki Nishimura, Koya Ohashi, Yilin Li, Jie Yang, Deddy Jobson, Xingxia Zha, Takeshi Matsumoto, Noriyoshi Sukegawa(+1 more)

Figure 1 for Fast solution to the fair ranking problem using the Sinkhorn algorithm

Figure 2 for Fast solution to the fair ranking problem using the Sinkhorn algorithm

Figure 3 for Fast solution to the fair ranking problem using the Sinkhorn algorithm

Abstract:In two-sided marketplaces such as online flea markets, recommender systems for providing consumers with personalized item rankings play a key role in promoting transactions between providers and consumers. Meanwhile, two-sided marketplaces face the problem of balancing consumer satisfaction and fairness among items to stimulate activity of item providers. Saito and Joachims (2022) devised an impact-based fair ranking method for maximizing the Nash social welfare based on fair division; however, this method, which requires solving a large-scale constrained nonlinear optimization problem, is very difficult to apply to practical-scale recommender systems. We thus propose a fast solution to the impact-based fair ranking problem. We first transform the fair ranking problem into an unconstrained optimization problem and then design a gradient ascent method that repeatedly executes the Sinkhorn algorithm. Experimental results demonstrate that our algorithm provides fair rankings of high quality and is about 1000 times faster than application of commercial optimization software.

Via

Access Paper or Ask Questions

Robust portfolio optimization for recommender systems considering uncertainty of estimated statistics

Jun 09, 2024

Tomoya Yanagi, Shunnosuke Ikeda, Yuichi Takano

Abstract:This paper is concerned with portfolio optimization models for creating high-quality lists of recommended items to balance the accuracy and diversity of recommendations. However, the statistics (i.e., expectation and covariance of ratings) required for mean--variance portfolio optimization are subject to inevitable estimation errors. To remedy this situation, we focus on robust optimization techniques that derive reliable solutions to uncertain optimization problems. Specifically, we propose a robust portfolio optimization model that copes with the uncertainty of estimated statistics based on the cardinality-based uncertainty sets. This robust portfolio optimization model can be reduced to a mixed-integer linear optimization problem, which can be solved exactly using mathematical optimization solvers. Experimental results using two publicly available rating datasets demonstrate that our method can improve not only the recommendation accuracy but also the diversity of recommendations compared with conventional mean--variance portfolio optimization models. Notably, our method has the potential to improve the recommendation quality of various rating prediction algorithms.

Via

Access Paper or Ask Questions

Container pre-marshalling problem minimizing CV@R under uncertainty of ship arrival times

May 27, 2024

Daiki Ikuma, Shunnosuke Ikeda, Noriyoshi Sukegawa, Yuichi Takano

Figure 1 for Container pre-marshalling problem minimizing CV@R under uncertainty of ship arrival times

Figure 2 for Container pre-marshalling problem minimizing CV@R under uncertainty of ship arrival times

Figure 3 for Container pre-marshalling problem minimizing CV@R under uncertainty of ship arrival times

Figure 4 for Container pre-marshalling problem minimizing CV@R under uncertainty of ship arrival times

Abstract:This paper is concerned with the container pre-marshalling problem, which involves relocating containers in the storage area so that they can be efficiently loaded onto ships without reshuffles. In reality, however, ship arrival times are affected by various external factors, which can cause the order of container retrieval to be different from the initial plan. To represent such uncertainty, we generate multiple scenarios from a multivariate probability distribution of ship arrival times. We derive a mixed-integer linear optimization model to find an optimal container layout such that the conditional value-at-risk is minimized for the number of misplaced containers responsible for reshuffles. Moreover, we devise an exact algorithm based on the cutting-plane method to handle large-scale problems. Numerical experiments using synthetic datasets demonstrate that our method can produce high-quality container layouts compared with the conventional robust optimization model. Additionally, our algorithm can speed up the computation of solving large-scale problems.

Via

Access Paper or Ask Questions

Privacy-preserving recommender system using the data collaboration analysis for distributed datasets

May 24, 2024

Tomoya Yanagi, Shunnosuke Ikeda, Noriyoshi Sukegawa, Yuichi Takano

Abstract:In order to provide high-quality recommendations for users, it is desirable to share and integrate multiple datasets held by different parties. However, when sharing such distributed datasets, we need to protect personal and confidential information contained in the datasets. To this end, we establish a framework for privacy-preserving recommender systems using the data collaboration analysis of distributed datasets. Numerical experiments with two public rating datasets demonstrate that our privacy-preserving method for rating prediction can improve the prediction accuracy for distributed datasets. This study opens up new possibilities for privacy-preserving techniques in recommender systems.

Via

Access Paper or Ask Questions