We study the problem of identifying the dynamics of a linear system when one has access to samples generated by a similar (but not identical) system, in addition to data from the true system. We use a weighted least squares approach and provide finite sample performance guarantees on the quality of the identified dynamics. Our results show that one can effectively use the auxiliary data generated by the similar system to reduce the estimation error due to the process noise, at the cost of adding a portion of error that is due to intrinsic differences in the models of the true and auxiliary systems. We also provide numerical experiments to validate our theoretical results. Our analysis can be applied to a variety of important settings. For example, if the system dynamics change at some point in time (e.g., due to a fault), how should one leverage data from the prior system in order to learn the dynamics of the new system? As another example, if there is abundant data available from a simulated (but imperfect) model of the true system, how should one weight that data compared to the real data from the system? Our analysis provides insights into the answers to these questions.
We consider the problem of learning the dynamics of autonomous linear systems (i.e., systems that are not affected by external control inputs) from observations of multiple trajectories of those systems, with finite sample guarantees. Existing results on learning rate and consistency of autonomous linear system identification rely on observations of steady state behaviors from a single long trajectory, and are not applicable to unstable systems. In contrast, we consider the scenario of learning system dynamics based on multiple short trajectories, where there are no easily observed steady state behaviors. We provide a finite sample analysis, which shows that the dynamics can be learned at a rate $\mathcal{O}(\frac{1}{\sqrt{N}})$ for both stable and unstable systems, where $N$ is the number of trajectories, when the initial state of the system has zero mean (which is a common assumption in the existing literature). We further generalize our result to the case where the initial state has non-zero mean. We show that one can adjust the length of the trajectories to achieve a learning rate of $\mathcal{O}(\sqrt{\frac{\log{N}}{N})}$ for strictly stable systems and a learning rate of $\mathcal{O}(\frac{(\log{N})^d}{\sqrt{N}})$ for marginally stable systems, where $d$ is some constant.
Learning the relationships between various entities from time-series data is essential in many applications. Gaussian graphical models have been studied to infer these relationships. However, existing algorithms process data in a batch at a central location, limiting their applications in scenarios where data is gathered by different agents. In this paper, we propose a distributed sparse inverse covariance algorithm to learn the network structure (i.e., dependencies among observed entities) in real-time from data collected by distributed agents. Our approach is built on an online graphical alternating minimization algorithm, augmented with a consensus term that allows agents to learn the desired structure cooperatively. We allow the system designer to select the number of communication rounds and optimization steps per data point. We characterize the rate of convergence of our algorithm and provide simulations on synthetic datasets.
Learning the relationships between various entities from time-series data is essential in many applications. Gaussian graphical models have been studied to infer these relationships. However, existing algorithms process data in a batch at a central location, limiting their applications in scenarios where data is gathered by different agents. In this paper, we propose a distributed sparse inverse covariance algorithm to learn the network structure (i.e., dependencies among observed entities) in real-time from data collected by distributed agents. Our approach is built on an online graphical alternating minimization algorithm, augmented with a consensus term that allows agents to learn the desired structure cooperatively. We allow the system designer to select the number of communication rounds and optimization steps per data point. We characterize the rate of convergence of our algorithm and provide simulations on synthetic datasets.
We model the behavioral biases of human decision-making in securing interdependent systems and show that such behavioral decision-making leads to a suboptimal pattern of resource allocation compared to non-behavioral (rational) decision-making. We provide empirical evidence for the existence of such behavioral bias model through a controlled subject study with 145 participants. We then propose three learning techniques for enhancing decision-making in multi-round setups. We illustrate the benefits of our decision-making model through multiple interdependent real-world systems and quantify the level of gain compared to the case in which the defenders are behavioral. We also show the benefit of our learning techniques against different attack models. We identify the effects of different system parameters on the degree of suboptimality of security outcomes due to behavioral decision-making.
We study a fundamental problem in Bayesian learning, where the goal is to select a set of data sources with minimum cost while achieving a certain learning performance based on the data streams provided by the selected data sources. First, we show that the data source selection problem for Bayesian learning is NP-hard. We then show that the data source selection problem can be transformed into an instance of the submodular set covering problem studied in the literature, and provide a standard greedy algorithm to solve the data source selection problem with provable performance guarantees. Next, we propose a fast greedy algorithm that improves the running times of the standard greedy algorithm, while achieving performance guarantees that are comparable to those of the standard greedy algorithm. We provide insights into the performance guarantees of the greedy algorithms by analyzing special classes of the problem. Finally, we validate the theoretical results using numerical examples, and show that the greedy algorithms work well in practice.
We consider the problem of distributed hypothesis testing (or social learning) where a network of agents seeks to identify the true state of the world from a finite set of hypotheses, based on a series of stochastic signals that each agent receives. Prior work on this problem has provided distributed algorithms that guarantee asymptotic learning of the true state, with corresponding efforts to improve the rate of learning. In this paper, we first argue that one can readily modify existing asymptotic learning algorithms to enable learning in finite time, effectively yielding arbitrarily large (asymptotic) rates. We then provide a simple algorithm for finite-time learning which only requires the agents to exchange a binary vector (of length equal to the number of possible hypotheses) with their neighbors at each time-step. Finally, we show that if the agents know the diameter of the network, our algorithm can be further modified to allow all agents to learn the true state and stop transmitting to their neighbors after a finite number of time-steps.
We study a setting where each agent in a network receives certain private signals generated by an unknown static state that belongs to a finite set of hypotheses. The agents are tasked with collectively identifying the true state. To solve this problem in a communication-efficient manner, we propose an event-triggered distributed learning algorithm that is based on the principle of diffusing low beliefs on each false hypothesis. Building on this principle, we design a trigger condition under which an agent broadcasts only those components of its belief vector that have adequate innovation, to only those neighbors that require such information. We establish that under standard assumptions, each agent learns the true state exponentially fast almost surely. We also identify sparse communication regimes where the inter-communication intervals grow unbounded, and yet, the asymptotic learning rate of our algorithm remains the same as the best known rate for this problem. We then establish, both in theory and via simulations, that our event-triggering strategy has the potential to significantly reduce information flow from uninformative agents to informative agents. Finally, we argue that, as far as only asymptotic learning is concerned, one can allow for arbitrarily sparse communication patterns.
We introduce a simple time-triggered protocol to achieve communication-efficient non-Bayesian learning over a network. Specifically, we consider a scenario where a group of agents interact over a graph with the aim of discerning the true state of the world that generates their joint observation profiles. To address this problem, we propose a novel distributed learning rule wherein agents aggregate neighboring beliefs based on a min-protocol, and the inter-communication intervals grow geometrically at a rate $a \geq 1$. Despite such sparse communication, we show that each agent is still able to rule out every false hypothesis exponentially fast with probability $1$, as long as $a$ is finite. For the special case when communication occurs at every time-step, i.e., when $a=1$, we prove that the asymptotic learning rates resulting from our algorithm are network-structure independent, and a strict improvement upon those existing in the literature. In contrast, when $a>1$, our analysis reveals that the asymptotic learning rates vary across agents, and exhibit a non-trivial dependence on the network topology coupled with the relative entropies of the agents' likelihood models. This motivates us to consider the problem of allocating signal structures to agents to maximize appropriate performance metrics. In certain special cases, we show that the eccentricity centrality and the decay centrality of the underlying graph help identify optimal allocations; for more general scenarios, we bound the deviation from the optimal allocation as a function of the parameter $a$, and the diameter of the communication graph.