Alert button
Picture for John Mern

John Mern

Alert button

Autonomous Attack Mitigation for Industrial Control Systems

Nov 03, 2021
John Mern, Kyle Hatch, Ryan Silva, Cameron Hickert, Tamim Sookoor, Mykel J. Kochenderfer

Figure 1 for Autonomous Attack Mitigation for Industrial Control Systems
Figure 2 for Autonomous Attack Mitigation for Industrial Control Systems
Figure 3 for Autonomous Attack Mitigation for Industrial Control Systems
Figure 4 for Autonomous Attack Mitigation for Industrial Control Systems

Defending computer networks from cyber attack requires timely responses to alerts and threat intelligence. Decisions about how to respond involve coordinating actions across multiple nodes based on imperfect indicators of compromise while minimizing disruptions to network operations. Currently, playbooks are used to automate portions of a response process, but often leave complex decision-making to a human analyst. In this work, we present a deep reinforcement learning approach to autonomous response and recovery in large industrial control networks. We propose an attention-based neural architecture that is flexible to the size of the network under protection. To train and evaluate the autonomous defender agent, we present an industrial control network simulation environment suitable for reinforcement learning. Experiments show that the learned agent can effectively mitigate advanced attacks that progress with few observable signals over several months before execution. The proposed deep reinforcement learning approach outperforms a fully automated playbook method in simulation, taking less disruptive actions while also defending more nodes on the network. The learned policy is also more robust to changes in attacker behavior than playbook approaches.

* 11 pages 
Viaarxiv icon

Interpretable Local Tree Surrogate Policies

Sep 16, 2021
John Mern, Sidhart Krishnan, Anil Yildiz, Kyle Hatch, Mykel J. Kochenderfer

Figure 1 for Interpretable Local Tree Surrogate Policies
Figure 2 for Interpretable Local Tree Surrogate Policies
Figure 3 for Interpretable Local Tree Surrogate Policies
Figure 4 for Interpretable Local Tree Surrogate Policies

High-dimensional policies, such as those represented by neural networks, cannot be reasonably interpreted by humans. This lack of interpretability reduces the trust users have in policy behavior, limiting their use to low-impact tasks such as video games. Unfortunately, many methods rely on neural network representations for effective learning. In this work, we propose a method to build predictable policy trees as surrogates for policies such as neural networks. The policy trees are easily human interpretable and provide quantitative predictions of future behavior. We demonstrate the performance of this approach on several simulated tasks.

* pre-print, submitted to AAAI 2022 Conference, 7 pages 
Viaarxiv icon

Reinforcement Learning for Industrial Control Network Cyber Security Orchestration

Jun 09, 2021
John Mern, Kyle Hatch, Ryan Silva, Jeff Brush, Mykel J. Kochenderfer

Figure 1 for Reinforcement Learning for Industrial Control Network Cyber Security Orchestration
Figure 2 for Reinforcement Learning for Industrial Control Network Cyber Security Orchestration
Figure 3 for Reinforcement Learning for Industrial Control Network Cyber Security Orchestration
Figure 4 for Reinforcement Learning for Industrial Control Network Cyber Security Orchestration

Defending computer networks from cyber attack requires coordinating actions across multiple nodes based on imperfect indicators of compromise while minimizing disruptions to network operations. Advanced attacks can progress with few observable signals over several months before execution. The resulting sequential decision problem has large observation and action spaces and a long time-horizon, making it difficult to solve with existing methods. In this work, we present techniques to scale deep reinforcement learning to solve the cyber security orchestration problem for large industrial control networks. We propose a novel attention-based neural architecture with size complexity that is invariant to the size of the network under protection. A pre-training curriculum is presented to overcome early exploration difficulty. Experiments show in that the proposed approaches greatly improve both the learning sample complexity and converged policy performance over baseline methods in simulation.

* 12 pages, submitted to NeurIPS 2021 
Viaarxiv icon

Measurable Monte Carlo Search Error Bounds

Jun 08, 2021
John Mern, Mykel J. Kochenderfer

Figure 1 for Measurable Monte Carlo Search Error Bounds
Figure 2 for Measurable Monte Carlo Search Error Bounds
Figure 3 for Measurable Monte Carlo Search Error Bounds

Monte Carlo planners can often return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples. Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search. In this work, we prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes. These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value. The presented bound holds for general Monte Carlo solvers meeting mild convergence conditions. We empirically test the tightness of the bounds through experiments on a multi-armed bandit and a discrete Markov decision process for both a simple solver and Monte Carlo tree search.

* 9 pages, submitted to NeurIPS 2021 
Viaarxiv icon

Obstacle Avoidance Using a Monocular Camera

Dec 03, 2020
Kyle Hatch, John Mern, Mykel Kochenderfer

Figure 1 for Obstacle Avoidance Using a Monocular Camera
Figure 2 for Obstacle Avoidance Using a Monocular Camera
Figure 3 for Obstacle Avoidance Using a Monocular Camera
Figure 4 for Obstacle Avoidance Using a Monocular Camera

A collision avoidance system based on simple digital cameras would help enable the safe integration of small UAVs into crowded, low-altitude environments. In this work, we present an obstacle avoidance system for small UAVs that uses a monocular camera with a hybrid neural network and path planner controller. The system is comprised of a vision network for estimating depth from camera images, a high-level control network, a collision prediction network, and a contingency policy. This system is evaluated on a simulated UAV navigating an obstacle course in a constrained flight pattern. Results show the proposed system achieves low collision rates while maintaining operationally relevant flight speeds.

* AIAA SciTech Forum 2021 pre-print 
Viaarxiv icon

Improved POMDP Tree Search Planning with Prioritized Action Branching

Oct 07, 2020
John Mern, Anil Yildiz, Larry Bush, Tapan Mukerji, Mykel J. Kochenderfer

Figure 1 for Improved POMDP Tree Search Planning with Prioritized Action Branching
Figure 2 for Improved POMDP Tree Search Planning with Prioritized Action Branching
Figure 3 for Improved POMDP Tree Search Planning with Prioritized Action Branching
Figure 4 for Improved POMDP Tree Search Planning with Prioritized Action Branching

Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree. The proposed method first evaluates the action space according to a score function that is a linear combination of expected reward and expected information gain. The actions with the highest score are then added to the search tree during tree expansion. Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.

* 7 pages 
Viaarxiv icon

Bayesian Optimized Monte Carlo Planning

Oct 07, 2020
John Mern, Anil Yildiz, Zachary Sunberg, Tapan Mukerji, Mykel J. Kochenderfer

Figure 1 for Bayesian Optimized Monte Carlo Planning
Figure 2 for Bayesian Optimized Monte Carlo Planning
Figure 3 for Bayesian Optimized Monte Carlo Planning
Figure 4 for Bayesian Optimized Monte Carlo Planning

Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. Monte Carlo tree search with progressive widening attempts to improve scaling by sampling from the action space to construct a policy search tree. The performance of progressive widening search is dependent upon the action sampling policy, often requiring problem-specific samplers. In this work, we present a general method for efficient action sampling based on Bayesian optimization. The proposed method uses a Gaussian process to model a belief over the action-value function and selects the action that will maximize the expected improvement in the optimal action value. We implement the proposed approach in a new online tree search algorithm called Bayesian Optimized Monte Carlo Planning (BOMCP). Several experiments show that BOMCP is better able to scale to large action space POMDPs than existing state-of-the-art tree search solvers.

* 8 pages 
Viaarxiv icon

Towards Recurrent Autoregressive Flow Models

Jun 17, 2020
John Mern, Peter Morales, Mykel J. Kochenderfer

Figure 1 for Towards Recurrent Autoregressive Flow Models
Figure 2 for Towards Recurrent Autoregressive Flow Models
Figure 3 for Towards Recurrent Autoregressive Flow Models
Figure 4 for Towards Recurrent Autoregressive Flow Models

Stochastic processes generated by non-stationary distributions are difficult to represent with conventional models such as Gaussian processes. This work presents Recurrent Autoregressive Flows as a method toward general stochastic process modeling with normalizing flows. The proposed method defines a conditional distribution for each variable in a sequential process by conditioning the parameters of a normalizing flow with recurrent neural connections. Complex conditional relationships are learned through the recurrent network parameters. In this work, we present an initial design for a recurrent flow cell and a method to train the model to match observed empirical distributions. We demonstrate the effectiveness of this class of models through a series of experiments in which models are trained on three complex stochastic processes. We highlight the shortcomings of our current formulation and suggest some potential solutions.

Viaarxiv icon

Exchangeable Input Representations for Reinforcement Learning

Mar 19, 2020
John Mern, Dorsa Sadigh, Mykel J. Kochenderfer

Figure 1 for Exchangeable Input Representations for Reinforcement Learning
Figure 2 for Exchangeable Input Representations for Reinforcement Learning
Figure 3 for Exchangeable Input Representations for Reinforcement Learning
Figure 4 for Exchangeable Input Representations for Reinforcement Learning

Poor sample efficiency is a major limitation of deep reinforcement learning in many domains. This work presents an attention-based method to project neural network inputs into an efficient representation space that is invariant under changes to input ordering. We show that our proposed representation results in an input space that is a factor of $m!$ smaller for inputs of $m$ objects. We also show that our method is able to represent inputs over variable numbers of objects. Our experiments demonstrate improvements in sample efficiency for policy gradient methods on a variety of tasks. We show that our representation allows us to solve problems that are otherwise intractable when using na\"ive approaches.

* 6 pages, 7 figures 
Viaarxiv icon

Object Exchangeability in Reinforcement Learning: Extended Abstract

May 07, 2019
John Mern, Dorsa Sadigh, Mykel Kochenderfer

Figure 1 for Object Exchangeability in Reinforcement Learning: Extended Abstract
Figure 2 for Object Exchangeability in Reinforcement Learning: Extended Abstract

Although deep reinforcement learning has advanced significantly over the past several years, sample efficiency remains a major challenge. Careful choice of input representations can help improve efficiency depending on the structure present in the problem. In this work, we present an attention-based method to project inputs into an efficient representation space that is invariant under changes to input ordering. We show that our proposed representation results in a search space that is a factor of m! smaller for inputs of m objects. Our experiments demonstrate improvements in sample efficiency for policy gradient methods on a variety of tasks. We show that our representation allows us to solve problems that are otherwise intractable when using naive approaches.

* In Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Montreal,Canada, May 13 to 17, 2019,IFAAMAS, 3 pages 
Viaarxiv icon