Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Yan

Flexible and Adaptable Summarization via Expertise Separation

Jun 08, 2024

Xiuying Chen, Mingzhe Li, Shen Gao, Xin Cheng, Qingqing Zhu, Rui Yan, Xin Gao, Xiangliang Zhang

Figure 1 for Flexible and Adaptable Summarization via Expertise Separation

Figure 2 for Flexible and Adaptable Summarization via Expertise Separation

Figure 3 for Flexible and Adaptable Summarization via Expertise Separation

Figure 4 for Flexible and Adaptable Summarization via Expertise Separation

Abstract:A proficient summarization model should exhibit both flexibility -- the capacity to handle a range of in-domain summarization tasks, and adaptability -- the competence to acquire new knowledge and adjust to unseen out-of-domain tasks. Unlike large language models (LLMs) that achieve this through parameter scaling, we propose a more parameter-efficient approach in this study. Our motivation rests on the principle that the general summarization ability to capture salient information can be shared across different tasks, while the domain-specific summarization abilities need to be distinct and tailored. Concretely, we propose MoeSumm, a Mixture-of-Expert Summarization architecture, which utilizes a main expert for gaining the general summarization capability and deputy experts that selectively collaborate to meet specific summarization task requirements. We further propose a max-margin loss to stimulate the separation of these abilities. Our model's distinct separation of general and domain-specific summarization abilities grants it with notable flexibility and adaptability, all while maintaining parameter efficiency. MoeSumm achieves flexibility by managing summarization across multiple domains with a single model, utilizing a shared main expert and selected deputy experts. It exhibits adaptability by tailoring deputy experts to cater to out-of-domain few-shot and zero-shot scenarios. Experimental results on 11 datasets show the superiority of our model compared with recent baselines and LLMs. We also provide statistical and visual evidence of the distinct separation of the two abilities in MoeSumm (https://github.com/iriscxy/MoE_Summ).

* 10 pages, 7 figures, published in SIGIR 2024

Via

Access Paper or Ask Questions

Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Jun 05, 2024

Xiaoxi Sun, Jinpeng Li, Yan Zhong, Dongyan Zhao, Rui Yan

Figure 1 for Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Figure 2 for Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Figure 3 for Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Figure 4 for Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework

Abstract:The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial validation process, leading to performance degradation or limited applications. To overcome these limitations, we propose a Markov Chain-based multi-agent debate verification framework to enhance hallucination detection accuracy in concise claims. Our method integrates the fact-checking process, including claim detection, evidence retrieval, and multi-agent verification. In the verification stage, we deploy multiple agents through flexible Markov Chain-based debates to validate individual claims, ensuring meticulous verification outcomes. Experimental results across three generative tasks demonstrate that our approach achieves significant improvements over baselines.

* 18 pages, 3 figures

Via

Access Paper or Ask Questions

Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

May 28, 2024

Hongda Sun, Hongzhan Lin, Haiyu Yan, Chen Zhu, Yang Song, Xin Gao, Shuo Shang, Rui Yan

Figure 1 for Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

Figure 2 for Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

Figure 3 for Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

Figure 4 for Facilitating Multi-Role and Multi-Behavior Collaboration of Large Language Models for Online Job Seeking and Recruiting

Abstract:The emergence of online recruitment services has revolutionized the traditional landscape of job seeking and recruitment, necessitating the development of high-quality industrial applications to improve person-job fitting. Existing methods generally rely on modeling the latent semantics of resumes and job descriptions and learning a matching function between them. Inspired by the powerful role-playing capabilities of Large Language Models (LLMs), we propose to introduce a mock interview process between LLM-played interviewers and candidates. The mock interview conversations can provide additional evidence for candidate evaluation, thereby augmenting traditional person-job fitting based solely on resumes and job descriptions. However, characterizing these two roles in online recruitment still presents several challenges, such as developing the skills to raise interview questions, formulating appropriate answers, and evaluating two-sided fitness. To this end, we propose MockLLM, a novel applicable framework that divides the person-job matching process into two modules: mock interview generation and two-sided evaluation in handshake protocol, jointly enhancing their performance through collaborative behaviors between interviewers and candidates. We design a role-playing framework as a multi-role and multi-behavior paradigm to enable a single LLM agent to effectively behave with multiple functions for both parties. Moreover, we propose reflection memory generation and dynamic prompt modification techniques to refine the behaviors of both sides, enabling continuous optimization of the augmented additional evidence. Extensive experimental results show that MockLLM can achieve the best performance on person-job matching accompanied by high mock interview quality, envisioning its emerging application in real online recruitment in the future.

Via

Access Paper or Ask Questions

Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

May 22, 2024

Xin Cheng, Xiuying Chen, Shuqi Li, Di Luo, Xun Wang, Dongyan Zhao, Rui Yan

Figure 1 for Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Figure 2 for Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Figure 3 for Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Figure 4 for Leveraging 2D Information for Long-term Time Series Forecasting with Vanilla Transformers

Abstract:Time series prediction is crucial for understanding and forecasting complex dynamics in various domains, ranging from finance and economics to climate and healthcare. Based on Transformer architecture, one approach involves encoding multiple variables from the same timestamp into a single temporal token to model global dependencies. In contrast, another approach embeds the time points of individual series into separate variate tokens. The former method faces challenges in learning variate-centric representations, while the latter risks missing essential temporal information critical for accurate forecasting. In our work, we introduce GridTST, a model that combines the benefits of two approaches using innovative multi-directional attentions based on a vanilla Transformer. We regard the input time series data as a grid, where the $x$-axis represents the time steps and the $y$-axis represents the variates. A vertical slicing of this grid combines the variates at each time step into a \textit{time token}, while a horizontal slicing embeds the individual series across all time steps into a \textit{variate token}. Correspondingly, a \textit{horizontal attention mechanism} focuses on time tokens to comprehend the correlations between data at various time steps, while a \textit{vertical}, variate-aware \textit{attention} is employed to grasp multivariate correlations. This combination enables efficient processing of information across both time and variate dimensions, thereby enhancing the model's analytical strength. % We also integrate the patch technique, segmenting time tokens into subseries-level patches, ensuring that local semantic information is retained in the embedding. The GridTST model consistently delivers state-of-the-art performance across various real-world datasets.

Via

Access Paper or Ask Questions

Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

May 22, 2024

Tingchen Fu, Deng Cai, Lemao Liu, Shuming Shi, Rui Yan

Figure 1 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Figure 2 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Figure 3 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Figure 4 for Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction

Abstract:Supervised fine-tuning (SFT) on instruction-following corpus is a crucial approach toward the alignment of large language models (LLMs). However, the performance of LLMs on standard knowledge and reasoning benchmarks tends to suffer from deterioration at the latter stage of the SFT process, echoing the phenomenon of alignment tax. Through our pilot study, we put a hypothesis that the data biases are probably one cause behind the phenomenon. To address the issue, we introduce a simple disperse-then-merge framework. To be concrete, we disperse the instruction-following data into portions and train multiple sub-models using different data portions. Then we merge multiple models into a single one via model merging techniques. Despite its simplicity, our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.

* Accepted to the findings of ACL2024

Via

Access Paper or Ask Questions

AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

May 04, 2024

Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

Abstract:Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multiple persons with varying size and spatial occlusion in panoramic scenes, blocking the performance gain of PAR. To this end, we consider learning a detector adapting varying-size occluded persons, which is optimized along with the recognition module in the all-in-one framework. Therefore, we propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes by learning an adapt-focused detector and multi-granularity prototypes as the pretext tasks in an end-to-end way. Specifically, to accommodate the varying sizes and spatial occlusion of multiple persons in crowed panoramic scenes, we introduce a panoramic adapt-focuser, achieving the size-adapting detection of individuals by comprehensively selecting and performing fine-grained detections on object-dense sub-regions identified through original detections. In addition, to mitigate information loss due to inaccurate individual localizations, we introduce a bi-propagation prototyper that promotes closed-loop interaction and informative consistency across different granularities by facilitating bidirectional information propagation among the individual, group, and global levels. Extensive experiments demonstrate the significant performance of AdaFPP and emphasize its powerful applicability for PAR.

Via

Access Paper or Ask Questions

MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

May 03, 2024

Hongyu Qu, Rui Yan, Xiangbo Shu, Haoliang Gao, Peng Huang, Guo-Sen Xie

Figure 1 for MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

Figure 2 for MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

Figure 3 for MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

Figure 4 for MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition

Abstract:Recent few-shot action recognition (FSAR) methods achieve promising performance by performing semantic matching on learned discriminative features. However, most FSAR methods focus on single-scale (e.g., frame-level, segment-level, \etc) feature alignment, which ignores that human actions with the same semantic may appear at different velocities. To this end, we develop a novel Multi-Velocity Progressive-alignment (MVP-Shot) framework to progressively learn and align semantic-related action features at multi-velocity levels. Concretely, a Multi-Velocity Feature Alignment (MVFA) module is designed to measure the similarity between features from support and query videos with different velocity scales and then merge all similarity scores in a residual fashion. To avoid the multiple velocity features deviating from the underlying motion semantic, our proposed Progressive Semantic-Tailored Interaction (PSTI) module injects velocity-tailored text information into the video feature via feature interaction on channel and temporal domains at different velocities. The above two modules compensate for each other to predict query categories more accurately under the few-shot settings. Experimental results show our method outperforms current state-of-the-art methods on multiple standard few-shot benchmarks (i.e., HMDB51, UCF101, Kinetics, and SSv2-small).

Via

Access Paper or Ask Questions

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Apr 24, 2024

Lang Qin, Ziming Wang, Runhao Jiang, Rui Yan, Huajin Tang

Figure 1 for GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Figure 2 for GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Figure 3 for GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Figure 4 for GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Abstract:Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (TAP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.

Via

Access Paper or Ask Questions

DVF: Advancing Robust and Accurate Fine-Grained Image Retrieval with Retrieval Guidelines

Apr 24, 2024

Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li

Abstract:Fine-grained image retrieval (FGIR) is to learn visual representations that distinguish visually similar objects while maintaining generalization. Existing methods propose to generate discriminative features, but rarely consider the particularity of the FGIR task itself. This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models. These guidelines include emphasizing the object (G1), highlighting subcategory-specific discrepancies (G2), and employing effective training strategy (G3). Following G1 and G2, we design a novel Dual Visual Filtering mechanism for the plain visual transformer, denoted as DVF, to capture subcategory-specific discrepancies. Specifically, the dual visual filtering mechanism comprises an object-oriented module and a semantic-oriented module. These components serve to magnify objects and identify discriminative regions, respectively. Following G3, we implement a discriminative model training strategy to improve the discriminability and generalization ability of DVF. Extensive analysis and ablation studies confirm the efficacy of our proposed guidelines. Without bells and whistles, the proposed DVF achieves state-of-the-art performance on three widely-used fine-grained datasets in closed-set and open-set settings.

Via

Access Paper or Ask Questions

HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Apr 16, 2024

Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska

Figure 1 for HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Figure 2 for HSVI-based Online Minimax Strategies for Partially Observable Stochastic Games with Neural Perception Mechanisms

Abstract:We consider a variant of continuous-state partially-observable stochastic games with neural perception mechanisms and an asymmetric information structure. One agent has partial information, with the observation function implemented as a neural network, while the other agent is assumed to have full knowledge of the state. We present, for the first time, an efficient online method to compute an $\varepsilon$-minimax strategy profile, which requires only one linear program to be solved for each agent at every stage, instead of a complex estimation of opponent counterfactual values. For the partially-informed agent, we propose a continual resolving approach which uses lower bounds, pre-computed offline with heuristic search value iteration (HSVI), instead of opponent counterfactual values. This inherits the soundness of continual resolving at the cost of pre-computing the bound. For the fully-informed agent, we propose an inferred-belief strategy, where the agent maintains an inferred belief about the belief of the partially-informed agent based on (offline) upper bounds from HSVI, guaranteeing $\varepsilon$-distance to the value of the game at the initial belief known to both agents.

* 12 pages, 2 figures

Via

Access Paper or Ask Questions