Abstract:Cancer is often driven by specific combinations of an estimated two to nine gene mutations, known as multi-hit combinations. Identifying these combinations is critical for understanding carcinogenesis and designing targeted therapies. We formalise this challenge as the Multi-Hit Cancer Driver Set Cover Problem (MHCDSCP), a binary classification problem that selects gene combinations to maximise coverage of tumor samples while minimising coverage of normal samples. Existing approaches typically rely on exhaustive search and supercomputing infrastructure. In this paper, we present constraint programming and mixed integer programming formulations of the MHCDSCP. Evaluated on real-world cancer genomics data, our methods achieve performance comparable to state-of-the-art methods while running on a single commodity CPU in under a minute. Furthermore, we introduce a column generation heuristic capable of solving small instances to optimality. These results suggest that solving the MHCDSCP is less computationally intensive than previously believed, thereby opening research directions for exploring modelling assumptions.
Abstract:Location-based services rely heavily on efficient methods that search for relevant points-of-interest (POIs) near a given location. A k Nearest Neighbor (kNN) query is one such example that finds the k closest POIs from an agent's location. While most existing techniques focus on retrieving nearby POIs for a single agent, these search heuristics do not translate to many other applications. For example, Aggregate k Nearest Neighbor (AkNN) queries require POIs that are close to multiple agents. k Farthest Neighbor (kFN) queries require POIs that are the antithesis of nearest. Such problems naturally benefit from a hierarchical approach, but existing methods rely on Euclidean-based heuristics, which have diminished effectiveness in graphs such as road networks. We propose a novel data structure, COL-Tree (Compacted Object-Landmark Tree), to address this gap by enabling efficient hierarchical graph traversal using a more accurate landmark-based heuristic. We then present query algorithms that utilize COL-Trees to efficiently answer AkNN, kFN, and other queries. In our experiments on real-world and synthetic datasets, we demonstrate that our techniques significantly outperform existing approaches, achieving up to 4 orders of magnitude improvement. Moreover, this comes at a small pre-processing overhead in both theory and practice.