Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Caner Erden

Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models

Dec 18, 2025

Caner Erden

Figure 1 for Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models

Figure 2 for Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models

Figure 3 for Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models

Figure 4 for Multiscale Aggregated Hierarchical Attention (MAHA): A Game Theoretic and Optimization Driven Approach to Efficient Contextual Modeling in Large Language Models

Abstract:The quadratic computational complexity of MultiHead SelfAttention (MHSA) remains a fundamental bottleneck in scaling Large Language Models (LLMs) for longcontext tasks. While sparse and linearized attention mechanisms attempt to mitigate this, they often compromise the representation of global dependencies or fail to capture multiscale semantic granularity effectively. In this paper, we propose Multiscale Aggregated Hierarchical Attention (MAHA), a novel architectural framework that reformulates the attention mechanism through hierarchical decomposition and mathematically rigorous aggregation. Unlike conventional approaches that treat token interactions at a single resolution, MAHA dynamically partitions the input sequence into hierarchical scales via learnable downsampling operators. The core innovation lies in its aggregation strategy: we model the fusion of scalespecific attention matrices as a resource allocation problem, solved via a convex optimization framework or a Nash equilibriumbased gametheoretic approach. This ensures a theoretically optimal balance between local nuance and global context fidelity. Implemented within a hybrid dilatedconvolutional transformer backbone, MAHA utilizes differentiable optimization layers to enable endtoend training. Experimental evaluations demonstrate that MAHA achieves superior scalability; empirical FLOPs analysis confirms an 81% reduction in computational cost at a sequence length of 4096 compared to standard attention. This work bridges the gap between optimization theory and sequence modeling, offering a scalable solution for nextgeneration LLMs.

Via

Access Paper or Ask Questions

Dynamic Rank Reinforcement Learning for Adaptive Low-Rank Multi-Head Self Attention in Large Language Models

Dec 17, 2025

Caner Erden

Abstract:We propose Dynamic Rank Reinforcement Learning (DR-RL), a novel framework that adaptively optimizes the low-rank factorization of Multi-Head Self-Attention (MHSA) in Large Language Models (LLMs) through the integration of reinforcement learning and online matrix perturbation theory. While traditional low-rank approximations often rely on static rank assumptions--limiting their flexibility across diverse input contexts--our method dynamically selects ranks based on real-time sequence dynamics, layer-specific sensitivities, and hardware constraints. The core innovation lies in an RL agent that formulates rank selection as a sequential policy optimization problem, where the reward function strictly balances attention fidelity against computational latency. Crucially, we employ online matrix perturbation bounds to enable incremental rank updates, thereby avoiding the prohibitive cost of full decomposition during inference. Furthermore, the integration of a lightweight Transformer-based policy network and batched Singular Value Decomposition (SVD) operations ensures scalable deployment on modern GPU architectures. Experiments demonstrate that DR-RL maintains downstream accuracy statistically equivalent to full-rank attention while significantly reducing Floating Point Operations (FLOPs), particularly in long-sequence regimes (L > 4096). This work bridges the gap between adaptive efficiency and theoretical rigor in MHSA, offering a principled, mathematically grounded alternative to heuristic rank reduction techniques in resource-constrained deep learning. Source code and experiment logs are available at: https://github.com/canererden/DR_RL_Project

Via

Access Paper or Ask Questions

Predicting California Bearing Ratio with Ensemble and Neural Network Models: A Case Study from Turkiye

Dec 13, 2025

Abdullah Hulusi Kökçam, Uğur Dağdeviren, Talas Fikret Kurnaz, Alparslan Serhat Demir, Caner Erden

Abstract:The California Bearing Ratio (CBR) is a key geotechnical indicator used to assess the load-bearing capacity of subgrade soils, especially in transportation infrastructure and foundation design. Traditional CBR determination relies on laboratory penetration tests. Despite their accuracy, these tests are often time-consuming, costly, and can be impractical, particularly for large-scale or diverse soil profiles. Recent progress in artificial intelligence, especially machine learning (ML), has enabled data-driven approaches for modeling complex soil behavior with greater speed and precision. This study introduces a comprehensive ML framework for CBR prediction using a dataset of 382 soil samples collected from various geoclimatic regions in Türkiye. The dataset includes physicochemical soil properties relevant to bearing capacity, allowing multidimensional feature representation in a supervised learning context. Twelve ML algorithms were tested, including decision tree, random forest, extra trees, gradient boosting, xgboost, k-nearest neighbors, support vector regression, multi-layer perceptron, adaboost, bagging, voting, and stacking regressors. Each model was trained, validated, and evaluated to assess its generalization and robustness. Among them, the random forest regressor performed the best, achieving strong R2 scores of 0.95 (training), 0.76 (validation), and 0.83 (test). These outcomes highlight the model's powerful nonlinear mapping ability, making it a promising tool for predictive geotechnical tasks. The study supports the integration of intelligent, data-centric models in geotechnical engineering, offering an effective alternative to traditional methods and promoting digital transformation in infrastructure analysis and design.

* Proc. of the 13th Int. Symp. on Intelligent Manufacturing and Service Systems, pp. 563-570, 2025, ISBN 978-625-00-3472-9
* Presented at the 13th International Symposium on Intelligent Manufacturing and Service Systems, Duzce, Turkey, Sep 25-27, 2025. Also available on Zenodo: DOI 10.5281/zenodo.17530868

Via

Access Paper or Ask Questions

Soil Compaction Parameters Prediction Based on Automated Machine Learning Approach

Dec 09, 2025

Caner Erden, Alparslan Serhat Demir, Abdullah Hulusi Kokcam, Talas Fikret Kurnaz, Ugur Dagdeviren

Abstract:Soil compaction is critical in construction engineering to ensure the stability of structures like road embankments and earth dams. Traditional methods for determining optimum moisture content (OMC) and maximum dry density (MDD) involve labor-intensive laboratory experiments, and empirical regression models have limited applicability and accuracy across diverse soil types. In recent years, artificial intelligence (AI) and machine learning (ML) techniques have emerged as alternatives for predicting these compaction parameters. However, ML models often struggle with prediction accuracy and generalizability, particularly with heterogeneous datasets representing various soil types. This study proposes an automated machine learning (AutoML) approach to predict OMC and MDD. AutoML automates algorithm selection and hyperparameter optimization, potentially improving accuracy and scalability. Through extensive experimentation, the study found that the Extreme Gradient Boosting (XGBoost) algorithm provided the best performance, achieving R-squared values of 80.4% for MDD and 89.1% for OMC on a separate dataset. These results demonstrate the effectiveness of AutoML in predicting compaction parameters across different soil types. The study also highlights the importance of heterogeneous datasets in improving the generalization and performance of ML models. Ultimately, this research contributes to more efficient and reliable construction practices by enhancing the prediction of soil compaction parameters.

* Proc. of the 13th Int. Symp. on Intelligent Manufacturing and Service Systems, pp. 571-578, 2025, ISBN 978-625-00-3472-9
* Presented at the 13th International Symposium on Intelligent Manufacturing and Service Systems, Duzce, Turkey, Sep 25-27, 2025. Also available on Zenodo: DOI 10.5281/zenodo.17533851

Via

Access Paper or Ask Questions

Enhancing Machine Learning Model Performance with Hyper Parameter Optimization: A Comparative Study

Feb 14, 2023

Caner Erden, Halil Ibrahim Demir, Abdullah Hulusi Kökçam

Abstract:One of the most critical issues in machine learning is the selection of appropriate hyper parameters for training models. Machine learning models may be able to reach the best training performance and may increase the ability to generalize using hyper parameter optimization (HPO) techniques. HPO is a popular topic that artificial intelligence studies have focused on recently and has attracted increasing interest. While the traditional methods developed for HPO include exhaustive search, grid search, random search, and Bayesian optimization; meta-heuristic algorithms are also employed as more advanced methods. Meta-heuristic algorithms search for the solution space where the solutions converge to the best combination to solve a specific problem. These algorithms test various scenarios and evaluate the results to select the best-performing combinations. In this study, classical methods, such as grid, random search and Bayesian optimization, and population-based algorithms, such as genetic algorithms and particle swarm optimization, are discussed in terms of the HPO. The use of related search algorithms is explained together with Python programming codes developed on packages such as Scikit-learn, Sklearn Genetic, and Optuna. The performance of the search algorithms is compared on a sample data set, and according to the results, the particle swarm optimization algorithm has outperformed the other algorithms.

Via

Access Paper or Ask Questions