Post-processing immunity is a fundamental property of differential privacy: it enables the application of arbitrary data-independent transformations to the results of differentially private outputs without affecting their privacy guarantees. When query outputs must satisfy domain constraints, post-processing can be used to project the privacy-preserving outputs onto the feasible region. Moreover, when the feasible region is convex, a widely adopted class of post-processing steps is also guaranteed to improve accuracy. Post-processing has been applied successfully in many applications including census data-release, energy systems, and mobility. However, its effects on the noise distribution is poorly understood: It is often argued that post-processing may introduce bias and increase variance. This paper takes a first step towards understanding the properties of post-processing. It considers the release of census data and examines, both theoretically and empirically, the behavior of a widely adopted class of post-processing functions.
A critical concern in data-driven decision making is to build models whose outcomes do not discriminate against some demographic groups, including gender, ethnicity, or age. To ensure non-discrimination in learning tasks, knowledge of the sensitive attributes is essential, while, in practice, these attributes may not be available due to legal and ethical requirements. To address this challenge, this paper studies a model that protects the privacy of the individuals sensitive information while also allowing it to learn non-discriminatory predictors. The method relies on the notion of differential privacy and the use of Lagrangian duality to design neural networks that can accommodate fairness constraints while guaranteeing the privacy of sensitive attributes. The paper analyses the tension between accuracy, privacy, and fairness and the experimental evaluation illustrates the benefits of the proposed model on several prediction tasks.
This work reconsiders the concept of community-based trip sharing proposed by Hasan et al. (2018) that leverages the structure of commuting patterns and urban communities to optimize trip sharing. It aims at quantifying the benefits of autonomous vehicles for community-based trip sharing, compared to a car-pooling platform where vehicles are driven by their owners. In the considered problem, each rider specifies a desired arrival time for her inbound trip (commuting to work) and a departure time for her outbound trip (commuting back home). In addition, her commute time cannot deviate too much from the duration of a direct trip. Prior work motivated by reducing parking pressure and congestion in the city of Ann Arbor, Michigan, showed that a car-pooling platform for community-based trip sharing could reduce the number of vehicles by close to 60%. This paper studies the potential benefits of autonomous vehicles in further reducing the number of vehicles needed to serve all these commuting trips. It proposes a column-generation procedure that generates and assembles mini routes to serve inbound and outbound trips, using a lexicographic objective that first minimizes the required vehicle count and then the total travel distance. The optimization algorithm is evaluated on a large-scale, real-world dataset of commute trips from the city of Ann Arbor, Michigan. The results of the optimization show that it can leverage autonomous vehicles to reduce the daily vehicle usage by 92%, improving upon the results of the original Commute Trip Sharing Problem by 34%, while also reducing daily vehicle miles traveled by approximately 30%. These results demonstrate the significant potential of autonomous vehicles for the shared commuting of a community to a common work destination.
The security-constrained optimal power flow (SCOPF) is fundamental in power systems and connects the automatic primary response (APR) of synchronized generators with the short-term schedule. Every day, the SCOPF problem is repeatedly solved for various inputs to determine robust schedules given a set of contingencies. Unfortunately, the modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs, which are hard to solve. To address this challenge, leveraging the wealth of available historical data, this paper proposes a novel approach that combines deep learning and robust optimization techniques. Unlike recent machine-learning applications where the aim is to mitigate the computational burden of exact solvers, the proposed method predicts directly the SCOPF implementable solution. Feasibility is enforced in two steps. First, during training, a Lagrangian dual method penalizes violations of physical and operations constraints, which are iteratively added as necessary to the machine-learning model by a Column-and-Constraint-Generation Algorithm (CCGA). Second, another different CCGA restores feasibility by finding the closest feasible solution to the prediction. Experiments on large test cases show that the method results in significant time reduction for obtaining feasible solutions with an optimality gap below 0.1%.
The AC Optimal Power Flow (AC-OPF) is a key building block in many power system applications. It determines generator setpoints at minimal cost that meet the power demands while satisfying the underlying physical and operational constraints. It is non-convex and NP-hard, and computationally challenging for large-scale power systems. Motivated by the increased stochasticity in generation schedules and increasing penetration of renewable sources, this paper explores a deep learning approach to deliver highly efficient and accurate approximations to the AC-OPF. In particular, the paper proposes an integration of deep neural networks and Lagrangian duality to capture the physical and operational constraints. The resulting model, called OPF-DNN, is evaluated on real case studies from the French transmission system, with up to 3,400 buses and 4,500 lines. Computational results show that OPF-DNN produces highly accurate AC-OPF approximations whose costs are within 0.01% of optimality. OPF-DNN generates, in milliseconds, solutions that capture the problem constraints with high fidelity.
This paper is motivated by applications of a Census Bureau interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual. The released information can be the number of individuals living alone, the number of cars they own, or their salary brackets. Recent events have identified some of the privacy challenges faced by these organizations. To address them, this paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals. The counts are reported at multiple granularities (e.g., the national, state, and county levels) and must be consistent across all levels. The core of the mechanism is an optimization model that redistributes the noise introduced to achieve differential privacy in order to meet the consistency constraints between the hierarchical levels. The key technical contribution of the paper shows that this optimization problem can be solved in polynomial time by exploiting the structure of its cost functions. Experimental results on very large, real datasets show that the proposed mechanism provides improvements of up to two orders of magnitude in terms of computational efficiency and accuracy with respect to other state-of-the-art techniques.
This paper develops a novel differentially private framework to solve convex optimization problems with sensitive optimization data and complex physical or operational constraints. Unlike standard noise-additive algorithms, that act primarily on the problem data, objective or solution, and disregard the problem constraints, this framework requires the optimization variables to be a function of the noise and exploits a chance-constrained problem reformulation with formal feasibility guarantees. The noise is calibrated to provide differential privacy for identity and linear queries on the optimization solution. For many applications, including resource allocation problems, the proposed framework provides a trade-off between the expected optimality loss and the variance of optimization results.
We present a novel framework for modeling traffic congestion events over road networks based on new mutually exciting spatio-temporal point process models with attention mechanisms and neural network embeddings. Using multi-modal data by combining count data from traffic sensors with police reports that report traffic incidents, we aim to capture two types of triggering effect for congestion events. Current traffic congestion at one location may cause future congestion over the road network, and traffic incidents may cause spread traffic congestion. To capture the non-homogeneous temporal dependence of the event on the past, we introduce a novel attention-based mechanism based on neural networks embedding for the point process model. To incorporate the directional spatial dependence induced by the road network, we adapt the "tail-up" model from the context of spatial statistics to the traffic network setting. We demonstrate the superior performance of our approach compared to the state-of-the-art methods for both synthetic and real data.
This paper considers the dispatching of large-scale real-time ride-sharing systems to address congestion issues faced by many cities. The goal is to serve all customers (service guarantees) with a small number of vehicles while minimizing waiting times under constraints on ride duration. This paper proposes an end-to-end approach that tightly integrates a state-of-the-art dispatching algorithm, a machine-learning model to predict zone-to-zone demand over time, and a model predictive control optimization to relocate idle vehicles. Experiments using historic taxi trips in New York City indicate that this integration decreases average waiting times by about 30% over all test cases and reaches close to 55% on the largest instances for high-demand zones.
This paper studies how to apply differential privacy to constrained optimization problems whose inputs are sensitive. This task raises significant challenges since random perturbations of the input data often render the constrained optimization problem infeasible or change significantly the nature of its optimal solutions. To address this difficulty, this paper proposes a bilevel optimization model that can be used as a post-processing step: It redistributes the noise introduced by a differentially private mechanism optimally while restoring feasibility and near-optimality. The paper shows that, under a natural assumption, this bilevel model can be solved efficiently for real-life large-scale nonlinear nonconvex optimization problems with sensitive customer data. The experimental results demonstrate the accuracy of the privacy-preserving mechanism and showcase significant benefits compared to standard approaches.