Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julian Allagan

Robustness, Cost, and Attack-Surface Concentration in Phishing Detection

Mar 19, 2026

Julian Allagan, Mohamed Elbakary, Zohreh Safari, Weizheng Gao, Gabrielle Morgan, Essence Morgan, Vladimir Deriglazov

Abstract:Phishing detectors built on engineered website features attain near-perfect accuracy under i.i.d.\ evaluation, yet deployment security depends on robustness to post-deployment feature manipulation. We study this gap through a cost-aware evasion framework that models discrete, monotone feature edits under explicit attacker budgets. Three diagnostics are introduced: minimal evasion cost (MEC), the evasion survival rate $S(B)$, and the robustness concentration index (RCI). On the UCI Phishing Websites benchmark (11\,055 instances, 30 ternary features), Logistic Regression, Random Forests, Gradient Boosted Trees, and XGBoost all achieve $\mathrm{AUC}\ge 0.979$ under static evaluation. Under budgeted sanitization-style evasion, robustness converges across architectures: the median MEC equals 2 with full features, and over 80\% of successful minimal-cost evasions concentrate on three low-cost surface features. Feature restriction improves robustness only when it removes all dominant low-cost transitions. Under strict cost schedules, infrastructure-leaning feature sets exhibit 17-19\% infeasible mass for ensemble models, while the median MEC among evadable instances remains unchanged. We formalize this convergence: if a positive fraction of correctly detected phishing instances admit evasion through a single feature transition of minimal cost $c_{\min}$, no classifier can raise the corresponding MEC quantile above $c_{\min}$ without modifying the feature representation or cost model. Adversarial robustness in phishing detection is governed by feature economics rather than model complexity.

* 14 pages, 4 figures, 9 tables

Via

Access Paper or Ask Questions

Statistical and Machine Learning Analysis of Traffic Accidents on US 158 in Currituck County: A Comparison with HSM Predictions

Dec 26, 2025

Jennifer Sawyer, Julian Allagan

Abstract:This study extends previous hotspot and Chi-Square analysis by Sawyer \cite{sawyer2025hotspot} by integrating advanced statistical analysis, machine learning, and spatial modeling techniques to analyze five years (2019--2023) of traffic accident data from an 8.4-mile stretch of US 158 in Currituck County, NC. Building upon foundational statistical work, we apply Kernel Density Estimation (KDE), Negative Binomial Regression, Random Forest classification, and Highway Safety Manual (HSM) Safety Performance Function (SPF) comparisons to identify comprehensive temporal and spatial crash patterns. A Random Forest classifier predicts injury severity with 67\% accuracy, outperforming HSM SPF. Spatial clustering is confirmed via Moran's I test ($I = 0.32$, $p < 0.001$), and KDE analysis reveals hotspots near major intersections, validating and extending earlier hotspot identification methods. These results support targeted interventions to improve traffic safety on this vital transportation corridor. Our objective is to provide actionable insights for improving safety on US 158 while contributing to the broader understanding of rural highway safety analysis through methodological advancement beyond basic statistical techniques.

* 9 pages,7 figures, 7 tables

Via

Access Paper or Ask Questions