Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Min Zhang

Jake

Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Jan 09, 2025

Xiaojie Li, Yibo Yang, Jianlong Wu, David A. Clifton, Yue Yu, Bernard Ghanem, Min Zhang

Figure 1 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Figure 2 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Figure 3 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Figure 4 for Continuous Knowledge-Preserving Decomposition for Few-Shot Continual Learning

Abstract:Few-shot class-incremental learning (FSCIL) involves learning new classes from limited data while retaining prior knowledge, and often results in catastrophic forgetting. Existing methods either freeze backbone networks to preserve knowledge, which limits adaptability, or rely on additional modules or prompts, introducing inference overhead. To this end, we propose Continuous Knowledge-Preserving Decomposition for FSCIL (CKPD-FSCIL), a framework that decomposes a model's weights into two parts: one that compacts existing knowledge (knowledge-sensitive components) and another that carries redundant capacity to accommodate new abilities (redundant-capacity components). The decomposition is guided by a covariance matrix from replay samples, ensuring principal components align with classification abilities. During adaptation, we freeze the knowledge-sensitive components and only adapt the redundant-capacity components, fostering plasticity while minimizing interference without changing the architecture or increasing overhead. Additionally, CKPD introduces an adaptive layer selection strategy to identify layers with redundant capacity, dynamically allocating adapters. Experiments on multiple benchmarks show that CKPD-FSCIL outperforms state-of-the-art methods.

* Code: https://github.com/xiaojieli0903/CKPD-FSCIL

Via

Access Paper or Ask Questions

Investigating Numerical Translation with Large Language Models

Jan 09, 2025

Wei Tang, Jiawei Yu, Yuang Li, Yanqing Zhao, Weidong Zhang, Wei Feng, Min Zhang, Hao Yang

Figure 1 for Investigating Numerical Translation with Large Language Models

Figure 2 for Investigating Numerical Translation with Large Language Models

Figure 3 for Investigating Numerical Translation with Large Language Models

Abstract:The inaccurate translation of numbers can lead to significant security issues, ranging from financial setbacks to medical inaccuracies. While large language models (LLMs) have made significant advancements in machine translation, their capacity for translating numbers has not been thoroughly explored. This study focuses on evaluating the reliability of LLM-based machine translation systems when handling numerical data. In order to systematically test the numerical translation capabilities of currently open source LLMs, we have constructed a numerical translation dataset between Chinese and English based on real business data, encompassing ten types of numerical translation. Experiments on the dataset indicate that errors in numerical translation are a common issue, with most open-source LLMs faltering when faced with our test scenarios. Especially when it comes to numerical types involving large units like ``million", ``billion", and "yi", even the latest llama3.1 8b model can have error rates as high as 20%. Finally, we introduce three potential strategies to mitigate the numerical mistranslations for large units.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

Improving GenIR Systems Based on User Feedback

Jan 06, 2025

Qingyao Ai, Zhicheng Dou, Min Zhang

Figure 1 for Improving GenIR Systems Based on User Feedback

Figure 2 for Improving GenIR Systems Based on User Feedback

Figure 3 for Improving GenIR Systems Based on User Feedback

Abstract:In this chapter, we discuss how to improve the GenIR systems based on user feedback. Before describing the approaches, it is necessary to be aware that the concept of "user" has been extended in the interactions with the GenIR systems. Different types of feedback information and strategies are also provided. Then the alignment techniques are highlighted in terms of objectives and methods. Following this, various ways of learning from user feedback in GenIR are presented, including continual learning, learning and ranking in the conversational context, and prompt learning. Through this comprehensive exploration, it becomes evident that innovative techniques are being proposed beyond traditional methods of utilizing user feedback, and contribute significantly to the evolution of GenIR in the new era. We also summarize some challenging topics and future directions that require further investigation.

* Chapter 5 of the book on Information Access in the Era of Generative AI

Via

Access Paper or Ask Questions

Test-time Computing: from System-1 Thinking to System-2 Thinking

Jan 05, 2025

Yixin Ji, Juntao Li, Hai Ye, Kaixin Wu, Jia Xu, Linjian Mo, Min Zhang

Figure 1 for Test-time Computing: from System-1 Thinking to System-2 Thinking

Figure 2 for Test-time Computing: from System-1 Thinking to System-2 Thinking

Figure 3 for Test-time Computing: from System-1 Thinking to System-2 Thinking

Figure 4 for Test-time Computing: from System-1 Thinking to System-2 Thinking

Abstract:The remarkable performance of the o1 model in complex reasoning demonstrates that test-time computing scaling can further unlock the model's potential, enabling powerful System-2 thinking. However, there is still a lack of comprehensive surveys for test-time computing scaling. We trace the concept of test-time computing back to System-1 models. In System-1 models, test-time computing addresses distribution shifts and improves robustness and generalization through parameter updating, input modification, representation editing, and output calibration. In System-2 models, it enhances the model's reasoning ability to solve complex problems through repeated sampling, self-correction, and tree search. We organize this survey according to the trend of System-1 to System-2 thinking, highlighting the key role of test-time computing in the transition from System-1 models to weak System-2 models, and then to strong System-2 models. We also point out a few possible future directions.

* work in progress

Via

Access Paper or Ask Questions

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Jan 03, 2025

Xinshuo Hu, Zifei Shan, Xinping Zhao, Zetian Sun, Zhenyu Liu, Dongfang Li, Shaolin Ye, Xinyuan Wei, Qian Chen, Baotian Hu(+1 more)

Figure 1 for KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Figure 2 for KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Figure 3 for KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Figure 4 for KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Abstract:As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial. Despite the growing number of general embedding models, prior work often overlooks the critical role of training data quality. In this work, we introduce KaLM-Embedding, a general multilingual embedding model that leverages a large quantity of cleaner, more diverse, and domain-specific training data. Our model has been trained with key techniques proven to enhance performance: (1) persona-based synthetic data to create diversified examples distilled from LLMs, (2) ranking consistency filtering to remove less informative samples, and (3) semi-homogeneous task batch sampling to improve training efficacy. Departing from traditional BERT-like architectures, we adopt Qwen2-0.5B as the pre-trained model, facilitating the adaptation of auto-regressive language models for general embedding tasks. Extensive evaluations of the MTEB benchmark across multiple languages show that our model outperforms others of comparable size, setting a new standard for multilingual embedding models with <1B parameters.

* Technical Report. 23 pages, 6 figures, 10 tables

Via

Access Paper or Ask Questions

Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling

Dec 30, 2024

Min Zhang, Zilin Wang, Liyan Chen, Kunhong Liu, Juncong Lin

Figure 1 for Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling

Figure 2 for Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling

Figure 3 for Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling

Figure 4 for Dialogue Director: Bridging the Gap in Dialogue Visualization for Multimodal Storytelling

Abstract:Recent advances in AI-driven storytelling have enhanced video generation and story visualization. However, translating dialogue-centric scripts into coherent storyboards remains a significant challenge due to limited script detail, inadequate physical context understanding, and the complexity of integrating cinematic principles. To address these challenges, we propose Dialogue Visualization, a novel task that transforms dialogue scripts into dynamic, multi-view storyboards. We introduce Dialogue Director, a training-free multimodal framework comprising a Script Director, Cinematographer, and Storyboard Maker. This framework leverages large multimodal models and diffusion-based architectures, employing techniques such as Chain-of-Thought reasoning, Retrieval-Augmented Generation, and multi-view synthesis to improve script understanding, physical context comprehension, and cinematic knowledge integration. Experimental results demonstrate that Dialogue Director outperforms state-of-the-art methods in script interpretation, physical world understanding, and cinematic principle application, significantly advancing the quality and controllability of dialogue-based story visualization.

Via

Access Paper or Ask Questions

MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search

Dec 28, 2024

Zhaohui Wang, Min Zhang, Jingran Yang, Bojie Shao

Figure 1 for MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search

Figure 2 for MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search

Figure 3 for MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search

Figure 4 for MAFT: Efficient Model-Agnostic Fairness Testing for Deep Neural Networks via Zero-Order Gradient Search

Abstract:Deep neural networks (DNNs) have shown powerful performance in various applications and are increasingly being used in decision-making systems. However, concerns about fairness in DNNs always persist. Some efficient white-box fairness testing methods about individual fairness have been proposed. Nevertheless, the development of black-box methods has stagnated, and the performance of existing methods is far behind that of white-box methods. In this paper, we propose a novel black-box individual fairness testing method called Model-Agnostic Fairness Testing (MAFT). By leveraging MAFT, practitioners can effectively identify and address discrimination in DL models, regardless of the specific algorithm or architecture employed. Our approach adopts lightweight procedures such as gradient estimation and attribute perturbation rather than non-trivial procedures like symbol execution, rendering it significantly more scalable and applicable than existing methods. We demonstrate that MAFT achieves the same effectiveness as state-of-the-art white-box methods whilst improving the applicability to large-scale networks. Compared to existing black-box approaches, our approach demonstrates distinguished performance in discovering fairness violations w.r.t effectiveness (approximately 14.69 times) and efficiency (approximately 32.58 times).

* Accepted by ICSE24

Via

Access Paper or Ask Questions

"I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

Dec 26, 2024

Jiawei Yu, Xiang Geng, Yuang Li, Mengxin Ren, Wei Tang, Jiahuan Li, Zhibin Lan, Min Zhang, Hao Yang, Shujian Huang(+1 more)

Figure 1 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

Figure 2 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

Figure 3 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

Figure 4 for "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities

Abstract:Spoken named entity recognition (NER) aims to identify named entities from speech, playing an important role in speech processing. New named entities appear every day, however, annotating their Spoken NER data is costly. In this paper, we demonstrate that existing Spoken NER systems perform poorly when dealing with previously unseen named entities. To tackle this challenge, we propose a method for generating Spoken NER data based on a named entity dictionary (NED) to reduce costs. Specifically, we first use a large language model (LLM) to generate sentences from the sampled named entities and then use a text-to-speech (TTS) system to generate the speech. Furthermore, we introduce a noise metric to filter out noisy data. To evaluate our approach, we release a novel Spoken NER benchmark along with a corresponding NED containing 8,853 entities. Experiment results show that our method achieves state-of-the-art (SOTA) performance in the in-domain, zero-shot domain adaptation, and fully zero-shot settings. Our data will be available at https://github.com/DeepLearnXMU/HeardU.

* Accepted by ICASSP 2025

Via

Access Paper or Ask Questions

GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

Dec 22, 2024

Xin Zhang, Yanzhao Zhang, Wen Xie, Mingxin Li, Ziqi Dai, Dingkun Long, Pengjun Xie, Meishan Zhang, Wenjie Li, Min Zhang

Figure 1 for GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

Figure 2 for GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

Figure 3 for GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

Figure 4 for GME: Improving Universal Multimodal Retrieval by Multimodal LLMs

Abstract:Universal Multimodal Retrieval (UMR) aims to enable search across various modalities using a unified model, where queries and candidates can consist of pure text, images, or a combination of both. Previous work has attempted to adopt multimodal large language models (MLLMs) to realize UMR using only text data. However, our preliminary experiments demonstrate that more diverse multimodal training data can further unlock the potential of MLLMs. Despite its effectiveness, the existing multimodal training data is highly imbalanced in terms of modality, which motivates us to develop a training data synthesis pipeline and construct a large-scale, high-quality fused-modal training dataset. Based on the synthetic training data, we develop the General Multimodal Embedder (GME), an MLLM-based dense retriever designed for UMR. Furthermore, we construct a comprehensive UMR Benchmark (UMRB) to evaluate the effectiveness of our approach. Experimental results show that our method achieves state-of-the-art performance among existing UMR methods. Last, we provide in-depth analyses of model scaling, training strategies, and perform ablation studies on both the model and synthetic data.

* 32 pages, models at https://huggingface.co/Alibaba-NLP/gme-Qwen2-VL-2B-Instruct

Via

Access Paper or Ask Questions

DynamicKV: Task-Aware Adaptive KV Cache Compression for Long Context LLMs

Dec 19, 2024

Xiabin Zhou, Wenbin Wang, Minyan Zeng, Jiaxian Guo, Xuebo Liu, Li Shen, Min Zhang, Liang Ding

Abstract:Efficient KV cache management in LLMs is crucial for long-context tasks like RAG and summarization. Existing KV cache compression methods enforce a fixed pattern, neglecting task-specific characteristics and reducing the retention of essential information. However, we observe distinct activation patterns across layers in various tasks, highlighting the need for adaptive strategies tailored to each task's unique demands. Based on this insight, we propose DynamicKV, a method that dynamically optimizes token retention by adjusting the number of tokens retained at each layer to adapt to the specific task. DynamicKV establishes global and per-layer maximum KV cache budgets, temporarily retaining the maximum budget for the current layer, and periodically updating the KV cache sizes of all preceding layers during inference. Our method retains only 1.7% of the KV cache size while achieving ~85% of the Full KV cache performance on LongBench. Notably, even under extreme compression (0.9%), DynamicKV surpasses state-of-the-art (SOTA) methods by 11% in the Needle-in-a-Haystack test using Mistral-7B-Instruct-v0.2. The code will be released.

Via

Access Paper or Ask Questions