Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Setareh Maghsudi

One Model for All: Multi-Objective Controllable Language Models

Apr 06, 2026

Qiang He, Yucheng Yang, Tianyi Zhou, Meng Fang, Mykola Pechenizkiy, Setareh Maghsudi

Abstract:Aligning large language models (LLMs) with human preferences is critical for enhancing LLMs' safety, helpfulness, humor, faithfulness, etc. Current reinforcement learning from human feedback (RLHF) mainly focuses on a fixed reward learned from average human ratings, which may weaken the adaptability and controllability of varying preferences. However, creating personalized LLMs requires aligning LLMs with individual human preferences, which is non-trivial due to the scarce data per user and the diversity of user preferences in multi-objective trade-offs, varying from emphasizing empathy in certain contexts to demanding efficiency and precision in others. Can we train one LLM to produce personalized outputs across different user preferences on the Pareto front? In this paper, we introduce Multi-Objective Control (MOC), which trains a single LLM to directly generate responses in the preference-defined regions of the Pareto front. Our approach introduces multi-objective optimization (MOO) principles into RLHF to train an LLM as a preference-conditioned policy network. We improve the computational efficiency of MOC by applying MOO at the policy level, enabling us to fine-tune a 7B-parameter model on a single A6000 GPU. Extensive experiments demonstrate the advantages of MOC over baselines in three aspects: (i) controllability of LLM outputs w.r.t. user preferences on the trade-off among multiple rewards; (ii) quality and diversity of LLM outputs, measured by the hyper-volume of multiple solutions achieved; and (iii) generalization to unseen preferences. These results highlight MOC's potential for real-world applications requiring scalable and customizable LLMs.

* Published in Transactions on Machine Learning Research (03/2026): https://openreview.net/forum?id=qAM5PmvFYY

Via

Access Paper or Ask Questions

Optimal Radio Resource Management for ISAC Under Imperfect Information: A Resource Economy-Driven Perspective

Mar 17, 2026

Luis F. Abanto-Leon, Setareh Maghsudi

Abstract:This work investigates the radio resource management (RRM) design for downlink integrated sensing and communications (ISAC) systems, jointly optimizing timeslot allocation, beam adaptation, functionality selection, and user-target pairing, with the goal of economizing resource consumption under imperfect information. Timeslot allocation assigns a number of discrete channel uses to targets and users, while beam adaptation selects transmit and receive beams with suitable directions, power levels, and beamwidths. Functionality selection determines whether each timeslot is used for sensing, communication, or their simultaneous operation, while user-target pairing specifies which users and targets are jointly served within the same timeslot. To ensure reliable operation, information imperfections arising from motion, quantization, feedback delays, and hardware limitations are considered. Resource economization is achieved by minimizing energy and time consumption through a multi-objective function, with strict prioritization of time savings. The resulting RRM problem is formulated as a semi-infinite, nonconvex mixed-integer nonlinear program (MINLP). Given the lack of generic methods for solving such problems, we propose a tailor-made approach that exploits the underlying structure of the problem to uncover hidden convexities. This enables an exact reformulation as a mixed-integer semidefinite program (MISDP), which can be solved to global optimality. Simulations reveal important interdependencies among the considered RRM components and show that the proposed approach achieves substantial performance improvements over baseline schemes, with gains up to 88%.

* IEEE Transactions on Mobile Computing

Via

Access Paper or Ask Questions

Robust Optimization Approach and Learning Based Hide-and-Seek Game for Resilient Network Design

Feb 12, 2026

Mohammad Khosravi, Setareh Maghsudi

Abstract:We study the design of resilient and reliable communication networks in which a signal can be transferred only up to a limited distance before its quality falls below an acceptable threshold. When excessive signal degradation occurs, regeneration is required through regenerators installed at selected network nodes. In this work, both network links and nodes are subject to uncertainty. The installation costs of regenerators are modeled using a budgeted uncertainty set. In addition, link lengths follow a dynamic budgeted uncertainty set introduced in this paper, where deviations may vary over time. Robust optimization seeks solutions whose performance is guaranteed under all scenarios represented by the underlying uncertainty set. Accordingly, the objective is to identify a minimum-cost subset of nodes for regenerator deployment that ensures full network connectivity, even under the worst possible realizations of uncertainty. To solve the problem, we first formulate it within a robust optimization framework, and then develop scalable solution methods based on column-and-constraint generation, Benders decomposition, and iterative robust optimization. In addition, we formulate a learning-based hide-and-seek game to further analyze the problem structure. The proposed approaches are evaluated against classical static budgeted robust models and deterministic worst-case formulations. Both theoretical analysis and computational results demonstrate the effectiveness and advantages of our methodology.

Via

Access Paper or Ask Questions

Generation of Programmatic Rules for Document Forgery Detection Using Large Language Models

Dec 22, 2025

Valentin Schmidberger, Manuel Eberhardinger, Setareh Maghsudi, Johannes Maucher

Abstract:Document forgery poses a growing threat to legal, economic, and governmental processes, requiring increasingly sophisticated verification mechanisms. One approach involves the use of plausibility checks, rule-based procedures that assess the correctness and internal consistency of data, to detect anomalies or signs of manipulation. Although these verification procedures are essential for ensuring data integrity, existing plausibility checks are manually implemented by software engineers, which is time-consuming. Recent advances in code generation with large language models (LLMs) offer new potential for automating and scaling the generation of these checks. However, adapting LLMs to the specific requirements of an unknown domain remains a significant challenge. This work investigates the extent to which LLMs, adapted on domain-specific code and data through different fine-tuning strategies, can generate rule-based plausibility checks for forgery detection on constrained hardware resources. We fine-tune open-source LLMs, Llama 3.1 8B and OpenCoder 8B, on structured datasets derived from real-world application scenarios and evaluate the generated plausibility checks on previously unseen forgery patterns. The results demonstrate that the models are capable of generating executable and effective verification procedures. This also highlights the potential of LLMs as scalable tools to support human decision-making in security-sensitive contexts where comprehensibility is required.

* Accepted at ICMLA 2025, the first two authors contributed equally

Via

Access Paper or Ask Questions

Anomaly Detection in Networked Bandits

Aug 27, 2025

Xiaotong Cheng, Setareh Maghsudi

Abstract:The nodes' interconnections on a social network often reflect their dependencies and information-sharing behaviors. Nevertheless, abnormal nodes, which significantly deviate from most of the network concerning patterns or behaviors, can lead to grave consequences. Therefore, it is imperative to design efficient online learning algorithms that robustly learn users' preferences while simultaneously detecting anomalies. We introduce a novel bandit algorithm to address this problem. Through network knowledge, the method characterizes the users' preferences and residuals of feature information. By learning and analyzing these preferences and residuals, it develops a personalized recommendation strategy for each user and simultaneously detects anomalies. We rigorously prove an upper bound on the regret of the proposed algorithm and experimentally compare it with several state-of-the-art collaborative contextual bandit algorithms on both synthetic and real-world datasets.

Via

Access Paper or Ask Questions

Pareto Multi-Objective Alignment for Language Models

Aug 11, 2025

Qiang He, Setareh Maghsudi

Abstract:Large language models (LLMs) are increasingly deployed in real-world applications that require careful balancing of multiple, often conflicting, objectives, such as informativeness versus conciseness, or helpfulness versus creativity. However, current alignment methods, primarily based on RLHF, optimize LLMs toward a single reward function, resulting in rigid behavior that fails to capture the complexity and diversity of human preferences. This limitation hinders the adaptability of LLMs to practical scenarios, making multi-objective alignment (MOA) a critical yet underexplored area. To bridge this gap, we propose Pareto Multi-Objective Alignment (PAMA), a principled and computationally efficient algorithm designed explicitly for MOA in LLMs. In contrast to computationally prohibitive multi-objective optimization (MOO) methods, PAMA transforms multi-objective RLHF into a convex optimization with a closed-form solution, significantly enhancing scalability. Traditional MOO approaches suffer from prohibitive O(n^2*d) complexity, where d represents the number of model parameters, typically in the billions for LLMs, rendering direct optimization infeasible. PAMA reduces this complexity to O(n) where n is the number of objectives, enabling optimization to be completed within milliseconds. We provide theoretical guarantees that PAMA converges to a Pareto stationary point, where no objective can be improved without degrading at least one other. Extensive experiments across language models ranging from 125M to 7B parameters demonstrate PAMA's robust and effective MOA capabilities, aligning with its theoretical advantages. PAMA provides a highly efficient solution to the MOA problem that was previously considered intractable, offering a practical and theoretically grounded approach to aligning LLMs with diverse human values, paving the way for versatile and adaptable real-world AI deployments.

* Accepted at ECML/PKDD 2025

Via

Access Paper or Ask Questions

Stochastic Multi-Objective Multi-Armed Bandits: Regret Definition and Algorithm

Jun 16, 2025

Mansoor Davoodi, Setareh Maghsudi

Abstract:Multi-armed bandit (MAB) problems are widely applied to online optimization tasks that require balancing exploration and exploitation. In practical scenarios, these tasks often involve multiple conflicting objectives, giving rise to multi-objective multi-armed bandits (MO-MAB). Existing MO-MAB approaches predominantly rely on the Pareto regret metric introduced in \cite{drugan2013designing}. However, this metric has notable limitations, particularly in accounting for all Pareto-optimal arms simultaneously. To address these challenges, we propose a novel and comprehensive regret metric that ensures balanced performance across conflicting objectives. Additionally, we introduce the concept of \textit{Efficient Pareto-Optimal} arms, which are specifically designed for online optimization. Based on our new metric, we develop a two-phase MO-MAB algorithm that achieves sublinear regret for both Pareto-optimal and efficient Pareto-optimal arms.

Via

Access Paper or Ask Questions

Quantum-Inspired Reinforcement Learning in the Presence of Epistemic Ambivalence

Mar 06, 2025

Alireza Habibi, Saeed Ghoorchian, Setareh Maghsudi

Abstract:The complexity of online decision-making under uncertainty stems from the requirement of finding a balance between exploiting known strategies and exploring new possibilities. Naturally, the uncertainty type plays a crucial role in developing decision-making strategies that manage complexity effectively. In this paper, we focus on a specific form of uncertainty known as epistemic ambivalence (EA), which emerges from conflicting pieces of evidence or contradictory experiences. It creates a delicate interplay between uncertainty and confidence, distinguishing it from epistemic uncertainty that typically diminishes with new information. Indeed, ambivalence can persist even after additional knowledge is acquired. To address this phenomenon, we propose a novel framework, called the epistemically ambivalent Markov decision process (EA-MDP), aiming to understand and control EA in decision-making processes. This framework incorporates the concept of a quantum state from the quantum mechanics formalism, and its core is to assess the probability and reward of every possible outcome. We calculate the reward function using quantum measurement techniques and prove the existence of an optimal policy and an optimal value function in the EA-MDP framework. We also propose the EA-epsilon-greedy Q-learning algorithm. To evaluate the impact of EA on decision-making and the expedience of our framework, we study two distinct experimental setups, namely the two-state problem and the lattice problem. Our results show that using our methods, the agent converges to the optimal policy in the presence of EA.

Via

Access Paper or Ask Questions

Optimal User and Target Scheduling, User-Target Pairing, and Low-Resolution Phase-Only Beamforming for ISAC Systems

Jan 20, 2025

Luis F. Abanto-Leon, Setareh Maghsudi

Abstract:We investigate the joint user and target scheduling, user-target pairing, and low-resolution phase-only beamforming design for integrated sensing and communications (ISAC). Scheduling determines which users and targets are served, while pairing specifies which users and targets are grouped into pairs. Additionally, the beamformers are designed using few-bit constant-modulus phase shifts. This resource allocation problem is a nonconvex mixed-integer nonlinear program (MINLP) and challenging to solve. To address it, we propose an exact mixed-integer linear program (MILP) reformulation, which leads to a globally optimal solution. Our results demonstrate the superiority of an optimal joint design compared to heuristic stage-wise approaches, which are highly sensitive to scenario characteristics.

* IEEE Transactions on Vehicular Technology

Via

Access Paper or Ask Questions

Hierarchical Functionality Prioritization in Multicast ISAC: Optimal Admission Control and Discrete-Phase Beamforming

Dec 31, 2024

Luis F. Abanto-Leon, Setareh Maghsudi

Abstract:We investigate the joint admission control and discrete-phase multicast beamforming design for integrated sensing and communications (ISAC) systems, where sensing and communications functionalities have different hierarchies. Specifically, the ISAC system first allocates resources to the higher-hierarchy functionality and opportunistically uses the remaining resources to support the lower-hierarchy one. This resource allocation problem is a nonconvex mixed-integer nonlinear program (MINLP). We propose an exact mixed-integer linear program (MILP) reformulation, leading to a globally optimal solution. In addition, we implemented three baselines for comparison, which our proposed method outperforms by more than 39%.

* IEEE Communications Letters, 2024
* 5 pages

Via

Access Paper or Ask Questions