Abstract:This work investigates the radio resource management (RRM) design for downlink integrated sensing and communications (ISAC) systems, jointly optimizing timeslot allocation, beam adaptation, functionality selection, and user-target pairing, with the goal of economizing resource consumption under imperfect information. Timeslot allocation assigns a number of discrete channel uses to targets and users, while beam adaptation selects transmit and receive beams with suitable directions, power levels, and beamwidths. Functionality selection determines whether each timeslot is used for sensing, communication, or their simultaneous operation, while user-target pairing specifies which users and targets are jointly served within the same timeslot. To ensure reliable operation, information imperfections arising from motion, quantization, feedback delays, and hardware limitations are considered. Resource economization is achieved by minimizing energy and time consumption through a multi-objective function, with strict prioritization of time savings. The resulting RRM problem is formulated as a semi-infinite, nonconvex mixed-integer nonlinear program (MINLP). Given the lack of generic methods for solving such problems, we propose a tailor-made approach that exploits the underlying structure of the problem to uncover hidden convexities. This enables an exact reformulation as a mixed-integer semidefinite program (MISDP), which can be solved to global optimality. Simulations reveal important interdependencies among the considered RRM components and show that the proposed approach achieves substantial performance improvements over baseline schemes, with gains up to 88%.
Abstract:We study the design of resilient and reliable communication networks in which a signal can be transferred only up to a limited distance before its quality falls below an acceptable threshold. When excessive signal degradation occurs, regeneration is required through regenerators installed at selected network nodes. In this work, both network links and nodes are subject to uncertainty. The installation costs of regenerators are modeled using a budgeted uncertainty set. In addition, link lengths follow a dynamic budgeted uncertainty set introduced in this paper, where deviations may vary over time. Robust optimization seeks solutions whose performance is guaranteed under all scenarios represented by the underlying uncertainty set. Accordingly, the objective is to identify a minimum-cost subset of nodes for regenerator deployment that ensures full network connectivity, even under the worst possible realizations of uncertainty. To solve the problem, we first formulate it within a robust optimization framework, and then develop scalable solution methods based on column-and-constraint generation, Benders decomposition, and iterative robust optimization. In addition, we formulate a learning-based hide-and-seek game to further analyze the problem structure. The proposed approaches are evaluated against classical static budgeted robust models and deterministic worst-case formulations. Both theoretical analysis and computational results demonstrate the effectiveness and advantages of our methodology.
Abstract:Document forgery poses a growing threat to legal, economic, and governmental processes, requiring increasingly sophisticated verification mechanisms. One approach involves the use of plausibility checks, rule-based procedures that assess the correctness and internal consistency of data, to detect anomalies or signs of manipulation. Although these verification procedures are essential for ensuring data integrity, existing plausibility checks are manually implemented by software engineers, which is time-consuming. Recent advances in code generation with large language models (LLMs) offer new potential for automating and scaling the generation of these checks. However, adapting LLMs to the specific requirements of an unknown domain remains a significant challenge. This work investigates the extent to which LLMs, adapted on domain-specific code and data through different fine-tuning strategies, can generate rule-based plausibility checks for forgery detection on constrained hardware resources. We fine-tune open-source LLMs, Llama 3.1 8B and OpenCoder 8B, on structured datasets derived from real-world application scenarios and evaluate the generated plausibility checks on previously unseen forgery patterns. The results demonstrate that the models are capable of generating executable and effective verification procedures. This also highlights the potential of LLMs as scalable tools to support human decision-making in security-sensitive contexts where comprehensibility is required.
Abstract:The nodes' interconnections on a social network often reflect their dependencies and information-sharing behaviors. Nevertheless, abnormal nodes, which significantly deviate from most of the network concerning patterns or behaviors, can lead to grave consequences. Therefore, it is imperative to design efficient online learning algorithms that robustly learn users' preferences while simultaneously detecting anomalies. We introduce a novel bandit algorithm to address this problem. Through network knowledge, the method characterizes the users' preferences and residuals of feature information. By learning and analyzing these preferences and residuals, it develops a personalized recommendation strategy for each user and simultaneously detects anomalies. We rigorously prove an upper bound on the regret of the proposed algorithm and experimentally compare it with several state-of-the-art collaborative contextual bandit algorithms on both synthetic and real-world datasets.
Abstract:Large language models (LLMs) are increasingly deployed in real-world applications that require careful balancing of multiple, often conflicting, objectives, such as informativeness versus conciseness, or helpfulness versus creativity. However, current alignment methods, primarily based on RLHF, optimize LLMs toward a single reward function, resulting in rigid behavior that fails to capture the complexity and diversity of human preferences. This limitation hinders the adaptability of LLMs to practical scenarios, making multi-objective alignment (MOA) a critical yet underexplored area. To bridge this gap, we propose Pareto Multi-Objective Alignment (PAMA), a principled and computationally efficient algorithm designed explicitly for MOA in LLMs. In contrast to computationally prohibitive multi-objective optimization (MOO) methods, PAMA transforms multi-objective RLHF into a convex optimization with a closed-form solution, significantly enhancing scalability. Traditional MOO approaches suffer from prohibitive O(n^2*d) complexity, where d represents the number of model parameters, typically in the billions for LLMs, rendering direct optimization infeasible. PAMA reduces this complexity to O(n) where n is the number of objectives, enabling optimization to be completed within milliseconds. We provide theoretical guarantees that PAMA converges to a Pareto stationary point, where no objective can be improved without degrading at least one other. Extensive experiments across language models ranging from 125M to 7B parameters demonstrate PAMA's robust and effective MOA capabilities, aligning with its theoretical advantages. PAMA provides a highly efficient solution to the MOA problem that was previously considered intractable, offering a practical and theoretically grounded approach to aligning LLMs with diverse human values, paving the way for versatile and adaptable real-world AI deployments.
Abstract:Multi-armed bandit (MAB) problems are widely applied to online optimization tasks that require balancing exploration and exploitation. In practical scenarios, these tasks often involve multiple conflicting objectives, giving rise to multi-objective multi-armed bandits (MO-MAB). Existing MO-MAB approaches predominantly rely on the Pareto regret metric introduced in \cite{drugan2013designing}. However, this metric has notable limitations, particularly in accounting for all Pareto-optimal arms simultaneously. To address these challenges, we propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives. Additionally, we introduce the concept of \textit{Efficient Pareto-Optimal} arms, which are specifically designed for online optimization. Based on our new metric, we develop a two-phase MO-MAB algorithm that achieves sublinear regret for both Pareto-optimal and efficient Pareto-optimal arms.
Abstract:The complexity of online decision-making under uncertainty stems from the requirement of finding a balance between exploiting known strategies and exploring new possibilities. Naturally, the uncertainty type plays a crucial role in developing decision-making strategies that manage complexity effectively. In this paper, we focus on a specific form of uncertainty known as epistemic ambivalence (EA), which emerges from conflicting pieces of evidence or contradictory experiences. It creates a delicate interplay between uncertainty and confidence, distinguishing it from epistemic uncertainty that typically diminishes with new information. Indeed, ambivalence can persist even after additional knowledge is acquired. To address this phenomenon, we propose a novel framework, called the epistemically ambivalent Markov decision process (EA-MDP), aiming to understand and control EA in decision-making processes. This framework incorporates the concept of a quantum state from the quantum mechanics formalism, and its core is to assess the probability and reward of every possible outcome. We calculate the reward function using quantum measurement techniques and prove the existence of an optimal policy and an optimal value function in the EA-MDP framework. We also propose the EA-epsilon-greedy Q-learning algorithm. To evaluate the impact of EA on decision-making and the expedience of our framework, we study two distinct experimental setups, namely the two-state problem and the lattice problem. Our results show that using our methods, the agent converges to the optimal policy in the presence of EA.
Abstract:We investigate the joint user and target scheduling, user-target pairing, and low-resolution phase-only beamforming design for integrated sensing and communications (ISAC). Scheduling determines which users and targets are served, while pairing specifies which users and targets are grouped into pairs. Additionally, the beamformers are designed using few-bit constant-modulus phase shifts. This resource allocation problem is a nonconvex mixed-integer nonlinear program (MINLP) and challenging to solve. To address it, we propose an exact mixed-integer linear program (MILP) reformulation, which leads to a globally optimal solution. Our results demonstrate the superiority of an optimal joint design compared to heuristic stage-wise approaches, which are highly sensitive to scenario characteristics.
Abstract:We investigate the joint admission control and discrete-phase multicast beamforming design for integrated sensing and communications (ISAC) systems, where sensing and communications functionalities have different hierarchies. Specifically, the ISAC system first allocates resources to the higher-hierarchy functionality and opportunistically uses the remaining resources to support the lower-hierarchy one. This resource allocation problem is a nonconvex mixed-integer nonlinear program (MINLP). We propose an exact mixed-integer linear program (MILP) reformulation, leading to a globally optimal solution. In addition, we implemented three baselines for comparison, which our proposed method outperforms by more than 39%.




Abstract:Large language models (LLMs) have shown impressive capabilities in generating program code, opening exciting opportunities for applying program synthesis to games. In this work, we explore the potential of LLMs to directly synthesize usable code for a wide range of gaming applications, focusing on two programming languages, Python and Java. We use an evolutionary hill-climbing algorithm, where the mutations and seeds of the initial programs are controlled by LLMs. For Python, the framework covers various game-related tasks, including five miniature versions of Atari games, ten levels of Baba is You, an environment inspired by Asteroids, and a maze generation task. For Java, the framework contains 12 games from the TAG tabletop games framework. Across 29 tasks, we evaluated 12 language models for Python and 8 for Java. Our findings suggest that the performance of LLMs depends more on the task than on model size. While larger models generate more executable programs, these do not always result in higher-quality solutions but are much more expensive. No model has a clear advantage, although on any specific task, one model may be better. Trying many models on a problem and using the best results across them is more reliable than using just one.