Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach to ensure Large Language Models (LLMs) align with human values. However, existing RLHF methods require a high computational cost, one main reason being that RLHF assigns both the generation and alignment tasks to the LLM simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the generation and alignment processes of LLMs, achieving alignment with human values at a much lower computational cost. We start with a novel Markov Decision Process (MDP) designed for the alignment process and employ Reinforcement Learning (RL) to train a streamlined proxy model that oversees the token generation of the LLM, without altering the LLM itself. Experiments show that our method achieves a comparable level of alignment with only 1\% of the training parameters of other methods.
Communication stands as a potent mechanism to harmonize the behaviors of multiple agents. However, existing works primarily concentrate on broadcast communication, which not only lacks practicality, but also leads to information redundancy. This surplus, one-fits-all information could adversely impact the communication efficiency. Furthermore, existing works often resort to basic mechanisms to integrate observed and received information, impairing the learning process. To tackle these difficulties, we propose Targeted and Trusted Multi-Agent Communication (T2MAC), a straightforward yet effective method that enables agents to learn selective engagement and evidence-driven integration. With T2MAC, agents have the capability to craft individualized messages, pinpoint ideal communication windows, and engage with reliable partners, thereby refining communication efficiency. Following the reception of messages, the agents integrate information observed and received from different sources at an evidence level. This process enables agents to collectively use evidence garnered from multiple perspectives, fostering trusted and cooperative behaviors. We evaluate our method on a diverse set of cooperative multi-agent tasks, with varying difficulties, involving different scales and ranging from Hallway, MPE to SMAC. The experiments indicate that the proposed model not only surpasses the state-of-the-art methods in terms of cooperative performance and communication efficiency, but also exhibits impressive generalization.
It is hard to directly implement Graph Neural Networks (GNNs) on large scaled graphs. Besides of existed neighbor sampling techniques, scalable methods decoupling graph convolutions and other learnable transformations into preprocessing and post classifier allow normal minibatch training. By replacing redundant concatenation operation with attention mechanism in SIGN, we propose Scalable and Adaptive Graph Neural Networks (SAGN). SAGN can adaptively gather neighborhood information among different hops. To further improve scalable models on semi-supervised learning tasks, we propose Self-Label-Enhance (SLE) framework combining self-training approach and label propagation in depth. We add base model with a scalable node label module. Then we iteratively train models and enhance train set in several stages. To generate input of node label module, we directly apply label propagation based on one-hot encoded label vectors without inner random masking. We find out that empirically the label leakage has been effectively alleviated after graph convolutions. The hard pseudo labels in enhanced train set participate in label propagation with true labels. Experiments on both inductive and transductive datasets demonstrate that, compared with other sampling-based and sampling-free methods, SAGN achieves better or comparable results and SLE can further improve performance.
Graph Neural Networks (GNNs) have received much attention recent years and have achieved state-of-the-art performances in many fields. The deeper GNNs can theoretically capture deeper neighborhood information. However, they often suffer from problems of over-fitting and over-smoothing. In order to incorporate deeper information while preserving considerable complexity and generalization ability, we propose Adaptive Graph Diffusion Networks with Hop-wise Attention (AGDNs-HA). We stack multi-hop neighborhood aggregations of different orders into single layer. Then we integrate them with the help of hop-wise attention, which is learnable and adaptive for each node. Experimental results on the standard dataset with semi-supervised node classification task show that our proposed methods achieve significant improvements.