Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinjun Xiong

NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Nov 12, 2024

Ruiyang Qin, Pengyu Ren, Zheyu Yan, Liu Liu, Dancheng Liu, Amir Nassereldine, Jinjun Xiong, Kai Ni, Sharon Hu, Yiyu Shi

Figure 1 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Figure 2 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Figure 3 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Figure 4 for NVCiM-PT: An NVCiM-assisted Prompt Tuning Framework for Edge LLMs

Abstract:Large Language Models (LLMs) deployed on edge devices, known as edge LLMs, need to continuously fine-tune their model parameters from user-generated data under limited resource constraints. However, most existing learning methods are not applicable for edge LLMs because of their reliance on high resources and low learning capacity. Prompt tuning (PT) has recently emerged as an effective fine-tuning method for edge LLMs by only modifying a small portion of LLM parameters, but it suffers from user domain shifts, resulting in repetitive training and losing resource efficiency. Conventional techniques to address domain shift issues often involve complex neural networks and sophisticated training, which are incompatible for PT for edge LLMs. Therefore, an open research question is how to address domain shift issues for edge LLMs with limited resources. In this paper, we propose a prompt tuning framework for edge LLMs, exploiting the benefits offered by non-volatile computing-in-memory (NVCiM) architectures. We introduce a novel NVCiM-assisted PT framework, where we narrow down the core operations to matrix-matrix multiplication, which can then be accelerated by performing in-situ computation on NVCiM. To the best of our knowledge, this is the first work employing NVCiM to improve the edge LLM PT performance.

* Accepted by DATE 2025

Via

Access Paper or Ask Questions

Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

Oct 07, 2024

Dancheng Liu, Jason Yang, Ishan Albrecht-Buehler, Helen Qin, Sophie Li, Yuting Hu, Amir Nassereldine, Jinjun Xiong

Figure 1 for Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

Figure 2 for Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

Figure 3 for Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

Figure 4 for Automatic Screening for Children with Speech Disorder using Automatic Speech Recognition: Opportunities and Challenges

Abstract:Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a growing need for efficient and scalable SLA methods powered by artificial intelligence. This position paper presents a survey of existing techniques suitable for automating SLA pipelines, with an emphasis on adapting automatic speech recognition (ASR) models for children's speech, an overview of current SLAs and their automated counterparts to demonstrate the feasibility of AI-enhanced SLA pipelines, and a discussion of practical considerations, including accessibility and privacy concerns, associated with the deployment of AI-powered SLAs.

* AAAI-FSS 24

Via

Access Paper or Ask Questions

Towards Precision Characterization of Communication Disorders using Models of Perceived Pragmatic Similarity

Sep 13, 2024

Nigel G. Ward, Andres Segura, Georgina Bugarini, Heike Lehnert-LeHouillier, Dancheng Liu, Jinjun Xiong, Olac Fuentes

Figure 1 for Towards Precision Characterization of Communication Disorders using Models of Perceived Pragmatic Similarity

Figure 2 for Towards Precision Characterization of Communication Disorders using Models of Perceived Pragmatic Similarity

Figure 3 for Towards Precision Characterization of Communication Disorders using Models of Perceived Pragmatic Similarity

Figure 4 for Towards Precision Characterization of Communication Disorders using Models of Perceived Pragmatic Similarity

Abstract:The diagnosis and treatment of individuals with communication disorders offers many opportunities for the application of speech technology, but research so far has not adequately considered: the diversity of conditions, the role of pragmatic deficits, and the challenges of limited data. This paper explores how a general-purpose model of perceived pragmatic similarity may overcome these limitations. It explains how it might support several use cases for clinicians and clients, and presents evidence that a simple model can provide value, and in particular can capture utterance aspects that are relevant to diagnoses of autism and specific language impairment.

* submitted to IEEE ICASSP 2025

Via

Access Paper or Ask Questions

LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning

Aug 15, 2024

Jiajie Li, Garrett Skinner, Gene Yang, Brian R Quaranto, Steven D Schwaitzberg, Peter C W Kim, Jinjun Xiong

Figure 1 for LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning

Figure 2 for LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning

Figure 3 for LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning

Figure 4 for LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning

Abstract:Multimodal large language models (LLMs) have achieved notable success across various domains, while research in the medical field has largely focused on unimodal images. Meanwhile, current general-domain multimodal models for videos still lack the capabilities to understand and engage in conversations about surgical videos. One major contributing factor is the absence of datasets in the surgical field. In this paper, we create a new dataset, Surg-QA, consisting of 102,000 surgical video-instruction pairs, the largest of its kind so far. To build such a dataset, we propose a novel two-stage question-answer generation pipeline with LLM to learn surgical knowledge in a structured manner from the publicly available surgical lecture videos. The pipeline breaks down the generation process into two stages to significantly reduce the task complexity, allowing us to use a more affordable, locally deployed open-source LLM than the premium paid LLM services. It also mitigates the risk of LLM hallucinations during question-answer generation, thereby enhancing the overall quality of the generated data. We further train LLaVA-Surg, a novel vision-language conversational assistant capable of answering open-ended questions about surgical videos, on this Surg-QA dataset, and conduct comprehensive evaluations on zero-shot surgical video question-answering tasks. We show that LLaVA-Surg significantly outperforms all previous general-domain models, demonstrating exceptional multimodal conversational skills in answering open-ended questions about surgical videos. We will release our code, model, and the instruction-tuning dataset.

Via

Access Paper or Ask Questions

FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data

Jun 25, 2024

Dancheng Liu, Jinjun Xiong

Figure 1 for FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data

Figure 2 for FASA: a Flexible and Automatic Speech Aligner for Extracting High-quality Aligned Children Speech Data

Abstract:Automatic Speech Recognition (ASR) for adults' speeches has made significant progress by employing deep neural network (DNN) models recently, but improvement in children's speech is still unsatisfactory due to children's speech's distinct characteristics. DNN models pre-trained on adult data often struggle in generalizing children's speeches with fine tuning because of the lack of high-quality aligned children's speeches. When generating datasets, human annotations are not scalable, and existing forced-alignment tools are not usable as they make impractical assumptions about the quality of the input transcriptions. To address these challenges, we propose a new forced-alignment tool, FASA, as a flexible and automatic speech aligner to extract high-quality aligned children's speech data from many of the existing noisy children's speech data. We demonstrate its usage on the CHILDES dataset and show that FASA can improve data quality by 13.6$\times$ over human annotations.

* 4 pages, 1 figure

Via

Access Paper or Ask Questions

Large Language Models have Intrinsic Self-Correction Ability

Jun 21, 2024

Dancheng Liu, Amir Nassereldine, Ziming Yang, Chenhui Xu, Yuting Hu, Jiajie Li, Utkarsh Kumar, Changjae Lee, Jinjun Xiong

Figure 1 for Large Language Models have Intrinsic Self-Correction Ability

Figure 2 for Large Language Models have Intrinsic Self-Correction Ability

Figure 3 for Large Language Models have Intrinsic Self-Correction Ability

Figure 4 for Large Language Models have Intrinsic Self-Correction Ability

Abstract:Large language models (LLMs) have attracted significant attention for their remarkable abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.

* in submission

Via

Access Paper or Ask Questions

PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics

Jun 21, 2024

Amir Nassereldine, Dancheng Liu, Chenhui Xu, Jinjun Xiong

Figure 1 for PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics

Figure 2 for PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics

Figure 3 for PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics

Figure 4 for PI-Whisper: An Adaptive and Incremental ASR Framework for Diverse and Evolving Speaker Characteristics

Abstract:As edge-based automatic speech recognition (ASR) technologies become increasingly prevalent for the development of intelligent and personalized assistants, three important challenges must be addressed for these resource-constrained ASR models, i.e., adaptivity, incrementality, and inclusivity. We propose a novel ASR framework, PI-Whisper, in this work and show how it can improve an ASR's recognition capabilities adaptively by identifying different speakers' characteristics in real-time, how such an adaption can be performed incrementally without repetitive retraining, and how it can improve the equity and fairness for diverse speaker groups. More impressively, our proposed PI-Whisper framework attains all of these nice properties while still achieving state-of-the-art accuracy with up to 13.7% reduction of the word error rate (WER) with linear scalability with respect to computing resources.

* 11 pages, 3 figures

Via

Access Paper or Ask Questions

Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Jun 06, 2024

Ruiyang Qin, Dancheng Liu, Zheyu Yan, Zhaoxuan Tan, Zixuan Pan, Zhenge Jia, Meng Jiang, Ahmed Abbasi, Jinjun Xiong, Yiyu Shi

Figure 1 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Figure 2 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Figure 3 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Figure 4 for Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices

Abstract:The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment onto resource-constrained edge devices will become more and more prevalent. An urging but open question is how a resource-constrained computing environment would affect the design choices for a personalized LLM. We study this problem empirically in this work. In particular, we consider the tradeoffs among a number of key design factors and their intertwined impacts on learning efficiency and accuracy. The factors include the learning methods for LLM customization, the amount of personalized data used for learning customization, the types and sizes of LLMs, the compression methods of LLMs, the amount of time afforded to learn, and the difficulty levels of the target use cases. Through extensive experimentation and benchmarking, we draw a number of surprisingly insightful guidelines for deploying LLMs onto resource-constrained devices. For example, an optimal choice between parameter learning and RAG may vary depending on the difficulty of the downstream task, the longer fine-tuning time does not necessarily help the model, and a compressed LLM may be a better choice than an uncompressed LLM to learn from limited personalized data.

* Under review

Via

Access Paper or Ask Questions

Infinite-Dimensional Feature Interaction

May 22, 2024

Chenhui Xu, Fuxun Yu, Maoliang Li, Zihao Zheng, Zirui Xu, Jinjun Xiong, Xiang Chen

Figure 1 for Infinite-Dimensional Feature Interaction

Figure 2 for Infinite-Dimensional Feature Interaction

Figure 3 for Infinite-Dimensional Feature Interaction

Figure 4 for Infinite-Dimensional Feature Interaction

Abstract:The past neural network design has largely focused on feature representation space dimension and its capacity scaling (e.g., width, depth), but overlooked the feature interaction space scaling. Recent advancements have shown shifted focus towards element-wise multiplication to facilitate higher-dimensional feature interaction space for better information transformation. Despite this progress, multiplications predominantly capture low-order interactions, thus remaining confined to a finite-dimensional interaction space. To transcend this limitation, classic kernel methods emerge as a promising solution to engage features in an infinite-dimensional space. We introduce InfiNet, a model architecture that enables feature interaction within an infinite-dimensional space created by RBF kernel. Our experiments reveal that InfiNet achieves new state-of-the-art, owing to its capability to leverage infinite-dimensional interactions, significantly enhancing model performance.

Via

Access Paper or Ask Questions

Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

May 07, 2024

Ruiyang Qin, Zheyu Yan, Dewen Zeng, Zhenge Jia, Dancheng Liu, Jianbo Liu, Zhi Zheng, Ningyuan Cao, Kai Ni, Jinjun Xiong(+1 more)

Figure 1 for Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

Figure 2 for Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

Figure 3 for Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

Figure 4 for Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures

Abstract:Large Language Models (LLMs) deployed on edge devices learn through fine-tuning and updating a certain portion of their parameters. Although such learning methods can be optimized to reduce resource utilization, the overall required resources remain a heavy burden on edge devices. Instead, Retrieval-Augmented Generation (RAG), a resource-efficient LLM learning method, can improve the quality of the LLM-generated content without updating model parameters. However, the RAG-based LLM may involve repetitive searches on the profile data in every user-LLM interaction. This search can lead to significant latency along with the accumulation of user data. Conventional efforts to decrease latency result in restricting the size of saved user data, thus reducing the scalability of RAG as user data continuously grows. It remains an open question: how to free RAG from the constraints of latency and scalability on edge devices? In this paper, we propose a novel framework to accelerate RAG via Computing-in-Memory (CiM) architectures. It accelerates matrix multiplications by performing in-situ computation inside the memory while avoiding the expensive data transfer between the computing unit and memory. Our framework, Robust CiM-backed RAG (RoCR), utilizing a novel contrastive learning-based training method and noise-aware training, can enable RAG to efficiently search profile data with CiM. To the best of our knowledge, this is the first work utilizing CiM to accelerate RAG.

Via

Access Paper or Ask Questions