Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jin Li

SoundAI Technology Co., Ltd

Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

May 26, 2024

Xiangxiang Dai, Jin Li, Xutong Liu, Anqi Yu, John C. S. Lui

Figure 1 for Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

Figure 2 for Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

Figure 3 for Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

Figure 4 for Cost-Effective Online Multi-LLM Selection with Versatile Reward Models

Abstract:With the rapid advancement of large language models (LLMs), the diversity of multi-LLM tasks and the variability in their pricing structures have become increasingly important, as costs can vary greatly between different LLMs. To tackle these challenges, we introduce the \textit{C2MAB-V}, a \underline{C}ost-effective \underline{C}ombinatorial \underline{M}ulti-armed \underline{B}andit with \underline{V}ersatile reward models for optimal LLM selection and usage. This online model differs from traditional static approaches or those reliant on a single LLM without cost consideration. With multiple LLMs deployed on a scheduling cloud and a local server dedicated to handling user queries, \textit{C2MAB-V} facilitates the selection of multiple LLMs over a combinatorial search space, specifically tailored for various collaborative task types with different reward models. Based on our designed online feedback mechanism and confidence bound technique, \textit{C2MAB-V} can effectively address the multi-LLM selection challenge by managing the exploration-exploitation trade-off across different models, while also balancing cost and reward for diverse tasks. The NP-hard integer linear programming problem for selecting multiple LLMs with trade-off dilemmas is addressed by: i) decomposing the integer problem into a relaxed form by the local server, ii) utilizing a discretization rounding scheme that provides optimal LLM combinations by the scheduling cloud, and iii) continual online updates based on feedback. Theoretically, we prove that \textit{C2MAB-V} offers strict guarantees over versatile reward models, matching state-of-the-art results for regret and violations in some degenerate cases. Empirically, we show that \textit{C2MAB-V} effectively balances performance and cost-efficiency with nine LLMs for three application scenarios.

* 29 pages, 12 figures, conference

Via

Access Paper or Ask Questions

Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark

May 24, 2024

Huali Ren, Anli Yan, Chong-zhi Gao, Hongyang Yan, Zhenxin Zhang, Jin Li

Figure 1 for Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark

Figure 2 for Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark

Figure 3 for Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark

Figure 4 for Are You Copying My Prompt? Protecting the Copyright of Vision Prompt for VPaaS via Watermark

Abstract:Visual Prompt Learning (VPL) differs from traditional fine-tuning methods in reducing significant resource consumption by avoiding updating pre-trained model parameters. Instead, it focuses on learning an input perturbation, a visual prompt, added to downstream task data for making predictions. Since learning generalizable prompts requires expert design and creation, which is technically demanding and time-consuming in the optimization process, developers of Visual Prompts as a Service (VPaaS) have emerged. These developers profit by providing well-crafted prompts to authorized customers. However, a significant drawback is that prompts can be easily copied and redistributed, threatening the intellectual property of VPaaS developers. Hence, there is an urgent need for technology to protect the rights of VPaaS developers. To this end, we present a method named \textbf{WVPrompt} that employs visual prompt watermarking in a black-box way. WVPrompt consists of two parts: prompt watermarking and prompt verification. Specifically, it utilizes a poison-only backdoor attack method to embed a watermark into the prompt and then employs a hypothesis-testing approach for remote verification of prompt ownership. Extensive experiments have been conducted on three well-known benchmark datasets using three popular pre-trained models: RN50, BIT-M, and Instagram. The experimental results demonstrate that WVPrompt is efficient, harmless, and robust to various adversarial operations.

* 11 pages, 7 figures,

Via

Access Paper or Ask Questions

ParamReL: Learning Parameter Space Representation via Progressively Encoding Bayesian Flow Networks

May 24, 2024

Zhangkai Wu, Xuhui Fan, Zhilin Zhao, Jin Li, Hui Chen, Longbing Cao

Abstract:The recently proposed Bayesian Flow Networks~(BFNs) show great potential in modeling parameter spaces, offering a unified strategy for handling continuous, discretized, and discrete data. However, BFNs cannot learn high-level semantic representation from the parameter space since {common encoders, which encode data into one static representation, cannot capture semantic changes in parameters.} This motivates a new direction: learning semantic representations hidden in the parameter spaces to characterize mixed-typed noisy data. {Accordingly, we propose a representation learning framework named ParamReL, which operates in the parameter space to obtain parameter-wise latent semantics that exhibit progressive structures. Specifically, ParamReL proposes a \emph{self-}encoder to learn latent semantics directly from parameters, rather than from observations. The encoder is then integrated into BFNs, enabling representation learning with various formats of observations. Mutual information terms further promote the disentanglement of latent semantics and capture meaningful semantics simultaneously.} We illustrate {conditional generation and reconstruction} in ParamReL via expanding BFNs, and extensive {quantitative} experimental results demonstrate the {superior effectiveness} of ParamReL in learning parameter representation.

Via

Access Paper or Ask Questions

Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification

Apr 12, 2024

Kleanthis Malialis, Jin Li, Christos G. Panayiotou, Marios M. Polycarpou

Figure 1 for Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification

Figure 2 for Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification

Figure 3 for Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification

Figure 4 for Incremental Learning with Concept Drift Detection and Prototype-based Embeddings for Graph Stream Classification

Abstract:Data stream mining aims at extracting meaningful knowledge from continually evolving data streams, addressing the challenges posed by nonstationary environments, particularly, concept drift which refers to a change in the underlying data distribution over time. Graph structures offer a powerful modelling tool to represent complex systems, such as, critical infrastructure systems and social networks. Learning from graph streams becomes a necessity to understand the dynamics of graph structures and to facilitate informed decision-making. This work introduces a novel method for graph stream classification which operates under the general setting where a data generating process produces graphs with varying nodes and edges over time. The method uses incremental learning for continual model adaptation, selecting representative graphs (prototypes) for each class, and creating graph embeddings. Additionally, it incorporates a loss-based concept drift detection mechanism to recalculate graph prototypes when drift is detected.

* IEEE World Congress on Computational Intelligence (WCCI) 2024; Keywords: graph streams, concept drift, incremental learning, graph prototypes, nonstationary environments

Via

Access Paper or Ask Questions

CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

Apr 02, 2024

Ruqi Liao, Chuqing Zhao, Jin Li, Weiqi Feng

Figure 1 for CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

Figure 2 for CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

Figure 3 for CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

Figure 4 for CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inference

Abstract:In response to the rising interest in large multimodal models, we introduce Cross-Attention Token Pruning (CATP), a precision-focused token pruning method. Our approach leverages cross-attention layers in multimodal models, exemplified by BLIP-2, to extract valuable information for token importance determination. CATP employs a refined voting strategy across model heads and layers. In evaluations, CATP achieves up to 12.1X higher accuracy compared to existing token pruning methods, addressing the trade-off between computational efficiency and model precision.

Via

Access Paper or Ask Questions

Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss

Mar 12, 2024

Xuhua Ren, Hengcan Shi, Jin Li

Abstract:Scene text recognition is an important and challenging task in computer vision. However, most prior works focus on recognizing pre-defined words, while there are various out-of-vocabulary (OOV) words in real-world applications. In this paper, we propose a novel open-vocabulary text recognition framework, Pseudo-OCR, to recognize OOV words. The key challenge in this task is the lack of OOV training data. To solve this problem, we first propose a pseudo label generation module that leverages character detection and image inpainting to produce substantial pseudo OOV training data from real-world images. Unlike previous synthetic data, our pseudo OOV data contains real characters and backgrounds to simulate real-world applications. Secondly, to reduce noises in pseudo data, we present a semantic checking mechanism to filter semantically meaningful data. Thirdly, we introduce a quality-aware margin loss to boost the training with pseudo data. Our loss includes a margin-based part to enhance the classification ability, and a quality-aware part to penalize low-quality samples in both real and pseudo data. Extensive experiments demonstrate that our approach outperforms the state-of-the-art on eight datasets and achieves the first rank in the ICDAR2022 challenge.

Via

Access Paper or Ask Questions

Understanding Missingness in Time-series Electronic Health Records for Individualized Representation

Feb 24, 2024

Ghadeer O. Ghosheh, Jin Li, Tingting Zhu

Figure 1 for Understanding Missingness in Time-series Electronic Health Records for Individualized Representation

Abstract:With the widespread of machine learning models for healthcare applications, there is increased interest in building applications for personalized medicine. Despite the plethora of proposed research for personalized medicine, very few focus on representing missingness and learning from the missingness patterns in time-series Electronic Health Records (EHR) data. The lack of focus on missingness representation in an individualized way limits the full utilization of machine learning applications towards true personalization. In this brief communication, we highlight new insights into patterns of missingness with real-world examples and implications of missingness in EHRs. The insights in this work aim to bridge the gap between theoretical assumptions and practical observations in real-world EHRs. We hope this work will open new doors for exploring directions for better representation in predictive modelling for true personalization.

Via

Access Paper or Ask Questions

Causal Learning for Trustworthy Recommender Systems: A Survey

Feb 13, 2024

Jin Li, Shoujin Wang, Qi Zhang, Longbing Cao, Fang Chen, Xiuzhen Zhang, Dietmar Jannach, Charu C. Aggarwal

Figure 1 for Causal Learning for Trustworthy Recommender Systems: A Survey

Figure 2 for Causal Learning for Trustworthy Recommender Systems: A Survey

Figure 3 for Causal Learning for Trustworthy Recommender Systems: A Survey

Figure 4 for Causal Learning for Trustworthy Recommender Systems: A Survey

Abstract:Recommender Systems (RS) have significantly advanced online content discovery and personalized decision-making. However, emerging vulnerabilities in RS have catalyzed a paradigm shift towards Trustworthy RS (TRS). Despite numerous progress on TRS, most of them focus on data correlations while overlooking the fundamental causal nature in recommendation. This drawback hinders TRS from identifying the cause in addressing trustworthiness issues, leading to limited fairness, robustness, and explainability. To bridge this gap, causal learning emerges as a class of promising methods to augment TRS. These methods, grounded in reliable causality, excel in mitigating various biases and noises while offering insightful explanations for TRS. However, there lacks a timely survey in this vibrant area. This paper creates an overview of TRS from the perspective of causal learning. We begin by presenting the advantages and common procedures of Causality-oriented TRS (CTRS). Then, we identify potential trustworthiness challenges at each stage and link them to viable causal solutions, followed by a classification of CTRS methods. Finally, we discuss several future directions for advancing this field.

Via

Access Paper or Ask Questions

UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

Jan 18, 2024

Bowen Shi, Peisen Zhao, Zichen Wang, Yuhang Zhang, Yaoming Wang, Jin Li, Wenrui Dai, Junni Zou, Hongkai Xiong, Qi Tian(+1 more)

Figure 1 for UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

Figure 2 for UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

Figure 3 for UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

Figure 4 for UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding

Abstract:Vision-language foundation models, represented by Contrastive language-image pre-training (CLIP), have gained increasing attention for jointly understanding both vision and textual tasks. However, existing approaches primarily focus on training models to match global image representations with textual descriptions, thereby overlooking the critical alignment between local regions and corresponding text tokens. This paper extends CLIP with multi-granularity alignment. Notably, we deliberately construct a new dataset comprising pseudo annotations at various levels of granularities, encompassing image-level, region-level, and pixel-level captions/tags. Accordingly, we develop a unified multi-granularity learning framework, named UMG-CLIP, that simultaneously empowers the model with versatile perception abilities across different levels of detail. Equipped with parameter efficient tuning, UMG-CLIP surpasses current widely used CLIP models and achieves state-of-the-art performance on diverse image understanding benchmarks, including open-world recognition, retrieval, semantic segmentation, and panoptic segmentation tasks. We hope UMG-CLIP can serve as a valuable option for advancing vision-language foundation models.

* The paper is undergoing internal legal review and will be resubmitted once it passes the review

Via

Access Paper or Ask Questions

IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Jan 09, 2024

Ghadeer O. Ghosheh, Jin Li, Tingting Zhu

Figure 1 for IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Figure 2 for IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Figure 3 for IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Figure 4 for IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records

Abstract:Electronic Health Records present a valuable modality for driving personalized medicine, where treatment is tailored to fit individual-level differences. For this purpose, many data-driven machine learning and statistical models rely on the wealth of longitudinal EHRs to study patients' physiological and treatment effects. However, longitudinal EHRs tend to be sparse and highly missing, where missingness could also be informative and reflect the underlying patient's health status. Therefore, the success of data-driven models for personalized medicine highly depends on how the EHR data is represented from physiological data, treatments, and the missing values in the data. To this end, we propose a novel deep-learning model that learns the underlying patient dynamics over time across multivariate data to generate personalized realistic values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE (Individualized GeNeration of Imputations in Time-series Electronic health records), utilises a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. In IGNITE, we further propose a novel individualized missingness mask (IMM), which helps our model generate values based on the individual's observed data and missingness patterns. We further extend the use of IGNITE from imputing missingness to a personalized data synthesizer, where it generates missing EHRs that were never observed prior or even generates new patients for various applications. We validate our model on three large publicly available datasets and show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.

Via

Access Paper or Ask Questions