Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Duan

Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning

Dec 11, 2025

Wei Duan, Jie Lu, En Yu, Junyu Xuan

Abstract:Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent methods excel at learning sparse coordination graphs-determining who communicates with whom-they do not address what information should be transmitted under hard bandwidth constraints. We study this bandwidth-limited regime and show that naive dimensionality reduction consistently degrades coordination performance. Hard bandwidth constraints force selective encoding, but deterministic projections lack mechanisms to control how compression occurs. We introduce Bandwidth-constrained Variational Message Encoding (BVME), a lightweight module that treats messages as samples from learned Gaussian posteriors regularized via KL divergence to an uninformative prior. BVME's variational framework provides principled, tunable control over compression strength through interpretable hyperparameters, directly constraining the representations used for decision-making. Across SMACv1, SMACv2, and MPE benchmarks, BVME achieves comparable or superior performance while using 67--83% fewer message dimensions, with gains most pronounced on sparse graphs where message quality critically impacts coordination. Ablations reveal U-shaped sensitivity to bandwidth, with BVME excelling at extreme ratios while adding minimal overhead.

* Submitted to AAMAS 2026

Via

Access Paper or Ask Questions

Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning

Apr 17, 2024

Wei Duan, Jie Lu, Junyu Xuan

Abstract:Cooperative Multi-Agent Reinforcement Learning (MARL) necessitates seamless collaboration among agents, often represented by an underlying relation graph. Existing methods for learning this graph primarily focus on agent-pair relations, neglecting higher-order relationships. While several approaches attempt to extend cooperation modelling to encompass behaviour similarities within groups, they commonly fall short in concurrently learning the latent graph, thereby constraining the information exchange among partially observed agents. To overcome these limitations, we present a novel approach to infer the Group-Aware Coordination Graph (GACG), which is designed to capture both the cooperation between agent pairs based on current observations and group-level dependencies from behaviour patterns observed across trajectories. This graph is further used in graph convolution for information exchange between agents during decision-making. To further ensure behavioural consistency among agents within the same group, we introduce a group distance loss, which promotes group cohesion and encourages specialization between groups. Our evaluations, conducted on StarCraft II micromanagement tasks, demonstrate GACG's superior performance. An ablation study further provides experimental evidence of the effectiveness of each component of our method.

* Accepted by IJCAI 2024

Via

Access Paper or Ask Questions

Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Mar 28, 2024

Wei Duan, Jie Lu, Junyu Xuan

Figure 1 for Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Figure 2 for Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Figure 3 for Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Figure 4 for Inferring Latent Temporal Sparse Coordination Graph for Multi-Agent Reinforcement Learning

Abstract:Effective agent coordination is crucial in cooperative Multi-Agent Reinforcement Learning (MARL). While agent cooperation can be represented by graph structures, prevailing graph learning methods in MARL are limited. They rely solely on one-step observations, neglecting crucial historical experiences, leading to deficient graphs that foster redundant or detrimental information exchanges. Additionally, high computational demands for action-pair calculations in dense graphs impede scalability. To address these challenges, we propose inferring a Latent Temporal Sparse Coordination Graph (LTS-CG) for MARL. The LTS-CG leverages agents' historical observations to calculate an agent-pair probability matrix, where a sparse graph is sampled from and used for knowledge exchange between agents, thereby simultaneously capturing agent dependencies and relation uncertainty. The computational complexity of this procedure is only related to the number of agents. This graph learning process is further augmented by two innovative characteristics: Predict-Future, which enables agents to foresee upcoming observations, and Infer-Present, ensuring a thorough grasp of the environmental context from limited data. These features allow LTS-CG to construct temporal graphs from historical and real-time information, promoting knowledge exchange during policy learning and effective collaboration. Graph learning and agent training occur simultaneously in an end-to-end manner. Our demonstrated results on the StarCraft II benchmark underscore LTS-CG's superior performance.

Via

Access Paper or Ask Questions

Layer-diverse Negative Sampling for Graph Neural Networks

Mar 18, 2024

Wei Duan, Jie Lu, Yu Guang Wang, Junyu Xuan

Figure 1 for Layer-diverse Negative Sampling for Graph Neural Networks

Figure 2 for Layer-diverse Negative Sampling for Graph Neural Networks

Figure 3 for Layer-diverse Negative Sampling for Graph Neural Networks

Figure 4 for Layer-diverse Negative Sampling for Graph Neural Networks

Abstract:Graph neural networks (GNNs) are a powerful solution for various structure learning applications due to their strong representation capabilities for graph data. However, traditional GNNs, relying on message-passing mechanisms that gather information exclusively from first-order neighbours (known as positive samples), can lead to issues such as over-smoothing and over-squashing. To mitigate these issues, we propose a layer-diverse negative sampling method for message-passing propagation. This method employs a sampling matrix within a determinantal point process, which transforms the candidate set into a space and selectively samples from this space to generate negative samples. To further enhance the diversity of the negative samples during each forward pass, we develop a space-squeezing method to achieve layer-wise diversity in multi-layer GNNs. Experiments on various real-world graph datasets demonstrate the effectiveness of our approach in improving the diversity of negative samples and overall learning performance. Moreover, adding negative samples dynamically changes the graph's topology, thus with the strong potential to improve the expressiveness of GNNs and reduce the risk of over-squashing.

* Published in Transactions on Machine Learning Research (03/2024)

Via

Access Paper or Ask Questions

Predicting Single-cell Drug Sensitivity by Adaptive Weighted Feature for Adversarial Multi-source Domain Adaptation

Mar 08, 2024

Wei Duan, Hui Liu

Figure 1 for Predicting Single-cell Drug Sensitivity by Adaptive Weighted Feature for Adversarial Multi-source Domain Adaptation

Figure 2 for Predicting Single-cell Drug Sensitivity by Adaptive Weighted Feature for Adversarial Multi-source Domain Adaptation

Figure 3 for Predicting Single-cell Drug Sensitivity by Adaptive Weighted Feature for Adversarial Multi-source Domain Adaptation

Figure 4 for Predicting Single-cell Drug Sensitivity by Adaptive Weighted Feature for Adversarial Multi-source Domain Adaptation

Abstract:The development of single-cell sequencing technology had promoted the generation of a large amount of single-cell transcriptional profiles, providing valuable opportunities to explore drug-resistant cell subpopulations in a tumor. However, the drug sensitivity data in single-cell level is still scarce to date, pressing an urgent and highly challenging task for computational prediction of the drug sensitivity to individual cells. This paper proposed scAdaDrug, a multi-source adaptive weighting model to predict single-cell drug sensitivity. We used an autoencoder to extract domain-invariant features related to drug sensitivity from multiple source domains by exploiting adversarial domain adaptation. Especially, we introduced an adaptive weight generator to produce importance-aware and mutual independent weights, which could adaptively modulate the embedding of each sample in dimension-level for both source and target domains. Extensive experimental results showed that our model achieved state-of-the-art performance in predicting drug sensitivity on sinle-cell datasets, as well as on cell line and patient datasets.

Via

Access Paper or Ask Questions

Graph Convolutional Neural Networks with Diverse Negative Samples via Decomposed Determinant Point Processes

Dec 05, 2022

Wei Duan, Junyu Xuan, Maoying Qiao, Jie Lu

Abstract:Graph convolutional networks (GCNs) have achieved great success in graph representation learning by extracting high-level features from nodes and their topology. Since GCNs generally follow a message-passing mechanism, each node aggregates information from its first-order neighbour to update its representation. As a result, the representations of nodes with edges between them should be positively correlated and thus can be considered positive samples. However, there are more non-neighbour nodes in the whole graph, which provide diverse and useful information for the representation update. Two non-adjacent nodes usually have different representations, which can be seen as negative samples. Besides the node representations, the structural information of the graph is also crucial for learning. In this paper, we used quality-diversity decomposition in determinant point processes (DPP) to obtain diverse negative samples. When defining a distribution on diverse subsets of all non-neighbouring nodes, we incorporate both graph structure information and node representations. Since the DPP sampling process requires matrix eigenvalue decomposition, we propose a new shortest-path-base method to improve computational efficiency. Finally, we incorporate the obtained negative samples into the graph convolution operation. The ideas are evaluated empirically in experiments on node classification tasks. These experiments show that the newly proposed methods not only improve the overall performance of standard representation learning but also significantly alleviate over-smoothing problems.

* Submitted to TNNLS and under review. arXiv admin note: text overlap with arXiv:2210.00728

Via

Access Paper or Ask Questions

Learning from the Dark: Boosting Graph Convolutional Neural Networks with Diverse Negative Samples

Oct 03, 2022

Wei Duan, Junyu Xuan, Maoying Qiao, Jie Lu

Figure 1 for Learning from the Dark: Boosting Graph Convolutional Neural Networks with Diverse Negative Samples

Figure 2 for Learning from the Dark: Boosting Graph Convolutional Neural Networks with Diverse Negative Samples

Figure 3 for Learning from the Dark: Boosting Graph Convolutional Neural Networks with Diverse Negative Samples

Figure 4 for Learning from the Dark: Boosting Graph Convolutional Neural Networks with Diverse Negative Samples

Abstract:Graph Convolutional Neural Networks (GCNs) has been generally accepted to be an effective tool for node representations learning. An interesting way to understand GCNs is to think of them as a message passing mechanism where each node updates its representation by accepting information from its neighbours (also known as positive samples). However, beyond these neighbouring nodes, graphs have a large, dark, all-but forgotten world in which we find the non-neighbouring nodes (negative samples). In this paper, we show that this great dark world holds a substantial amount of information that might be useful for representation learning. Most specifically, it can provide negative information about the node representations. Our overall idea is to select appropriate negative samples for each node and incorporate the negative information contained in these samples into the representation updates. Moreover, we show that the process of selecting the negative samples is not trivial. Our theme therefore begins by describing the criteria for a good negative sample, followed by a determinantal point process algorithm for efficiently obtaining such samples. A GCN, boosted by diverse negative samples, then jointly considers the positive and negative information when passing messages. Experimental evaluations show that this idea not only improves the overall performance of standard representation learning but also significantly alleviates over-smoothing problems.

Via

Access Paper or Ask Questions

Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training

Jun 30, 2022

Wei Duan, Zhe Zhang, Yi Yu, Keizo Oyama

Figure 1 for Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training

Figure 2 for Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training

Figure 3 for Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training

Abstract:Generating melody from lyrics is an interesting yet challenging task in the area of artificial intelligence and music. However, the difficulty of keeping the consistency between input lyrics and generated melody limits the generation quality of previous works. In our proposal, we demonstrate our proposed interpretable lyrics-to-melody generation system which can interact with users to understand the generation process and recreate the desired songs. To improve the reliability of melody generation that matches lyrics, mutual information is exploited to strengthen the consistency between lyrics and generated melodies. Gumbel-Softmax is exploited to solve the non-differentiability problem of generating discrete music attributes by Generative Adversarial Networks (GANs). Moreover, the predicted probabilities output by the generator is utilized to recommend music attributes. Interacting with our lyrics-to-melody generation system, users can listen to the generated AI song as well as recreate a new song by selecting from recommended music attributes.

* 3 pages, 3 figures

Via

Access Paper or Ask Questions