Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ali Etemad

A Shared Encoder Approach to Multimodal Representation Learning

Mar 03, 2025

Shuvendu Roy, Franklin Ogidi, Ali Etemad, Elham Dolatabadi, Arash Afkanpour

Abstract:Multimodal representation learning has demonstrated remarkable potential in enabling models to process and integrate diverse data modalities, such as text and images, for improved understanding and performance. While the medical domain can benefit significantly from this paradigm, the scarcity of paired multimodal data and reliance on proprietary or pretrained encoders pose significant challenges. In this work, we present a shared encoder framework for multimodal representation learning tailored to the medical domain. Our approach employs a single set of encoder parameters shared across modalities, augmented with learnable modality features. Empirical results demonstrate that our shared encoder idea achieves superior performance compared to separate modality-specific encoders, demonstrating improved generalization in data-constrained settings. Notably, the performance gains are more pronounced with fewer training examples, underscoring the efficiency of our shared encoder framework for real-world medical applications with limited data. Our code and experiment setup are available at https://github.com/VectorInstitute/shared_encoder.

Via

Access Paper or Ask Questions

Task-agnostic Prompt Compression with Context-aware Sentence Embedding and Reward-guided Task Descriptor

Feb 19, 2025

Barys Liskavets, Shuvendu Roy, Maxim Ushakov, Mark Klibanov, Ali Etemad, Shane Luke

Abstract:The rise of Large Language Models (LLMs) has led to significant interest in prompt compression, a technique aimed at reducing the length of input prompts while preserving critical information. However, the prominent approaches in prompt compression often require explicit questions or handcrafted templates for compression, limiting their generalizability. We propose Task-agnostic Prompt Compression (TPC), a novel framework that generalizes compression across tasks and domains without requiring input questions or templates. TPC generates a context-relevant task description using a task descriptor trained on a curated dataset of context and query pairs, and fine-tuned via reinforcement learning with a reward function designed to capture the most relevant information. The task descriptor is then utilized to compute the relevance of each sentence in the prompt to generate the compressed prompt. We introduce 3 model sizes (Base, Large, and Huge), where the largest model outperforms the existing state-of-the-art methods on LongBench and ZeroSCROLLS benchmarks, and our smallest model performs comparable to the existing solutions while being considerably smaller.

Via

Access Paper or Ask Questions

Scaling laws in wearable human activity recognition

Feb 05, 2025

Tom Hoddes, Alex Bijamov, Saket Joshi, Daniel Roggen, Ali Etemad, Robert Harle, David Racz

Figure 1 for Scaling laws in wearable human activity recognition

Figure 2 for Scaling laws in wearable human activity recognition

Figure 3 for Scaling laws in wearable human activity recognition

Figure 4 for Scaling laws in wearable human activity recognition

Abstract:Many deep architectures and self-supervised pre-training techniques have been proposed for human activity recognition (HAR) from wearable multimodal sensors. Scaling laws have the potential to help move towards more principled design by linking model capacity with pre-training data volume. Yet, scaling laws have not been established for HAR to the same extent as in language and vision. By conducting an exhaustive grid search on both amount of pre-training data and Transformer architectures, we establish the first known scaling laws for HAR. We show that pre-training loss scales with a power law relationship to amount of data and parameter count and that increasing the number of users in a dataset results in a steeper improvement in performance than increasing data per user, indicating that diversity of pre-training data is important, which contrasts to some previously reported findings in self-supervised HAR. We show that these scaling laws translate to downstream performance improvements on three HAR benchmark datasets of postures, modes of locomotion and activities of daily living: UCI HAR and WISDM Phone and WISDM Watch. Finally, we suggest some previously published works should be revisited in light of these scaling laws with more adequate model capacities.

Via

Access Paper or Ask Questions

SelfPrompt: Confidence-Aware Semi-Supervised Tuning for Robust Vision-Language Model Adaptation

Jan 24, 2025

Shuvendu Roy, Ali Etemad

Abstract:We present SelfPrompt, a novel prompt-tuning approach for vision-language models (VLMs) in a semi-supervised learning setup. Existing methods for tuning VLMs in semi-supervised setups struggle with the negative impact of the miscalibrated VLMs on pseudo-labelling, and the accumulation of noisy pseudo-labels. SelfPrompt addresses these challenges by introducing a cluster-guided pseudo-labelling method that improves pseudo-label accuracy, and a confidence-aware semi-supervised learning module that maximizes the utilization of unlabelled data by combining supervised learning and weakly-supervised learning. Additionally, we investigate our method in an active semi-supervised learning setup, where the labelled set is strategically selected to ensure the best utilization of a limited labelling budget. To this end, we propose a weakly-supervised sampling technique that selects a diverse and representative labelled set, which can be seamlessly integrated into existing methods to enhance their performance. We conduct extensive evaluations across 13 datasets, significantly surpassing state-of-the-art performances with average improvements of 6.23% in standard semi-supervised learning, 6.25% in active semi-supervised learning, and 4.9% in base-to-novel generalization, using a 2-shot setup. Furthermore, SelfPrompt shows excellent generalization in single-shot settings, achieving an average improvement of 11.78%.

Via

Access Paper or Ask Questions

Dynamic Prototype Rehearsal for Continual Learning in ECG Arrhythmia Detection

Jan 13, 2025

Sana Rahmani, Reetam Chatterjee, Ali Etemad, Javad Hashemi

Figure 1 for Dynamic Prototype Rehearsal for Continual Learning in ECG Arrhythmia Detection

Figure 2 for Dynamic Prototype Rehearsal for Continual Learning in ECG Arrhythmia Detection

Figure 3 for Dynamic Prototype Rehearsal for Continual Learning in ECG Arrhythmia Detection

Abstract:Continual Learning (CL) methods aim to learn from a sequence of tasks while avoiding the challenge of forgetting previous knowledge. We present DREAM-CL, a novel CL method for ECG arrhythmia detection that introduces dynamic prototype rehearsal memory. DREAM-CL selects representative prototypes by clustering data based on learning behavior during each training session. Within each cluster, we apply a smooth sorting operation that ranks samples by training difficulty, compressing extreme values and removing outliers. The more challenging samples are then chosen as prototypes for the rehearsal memory, ensuring effective knowledge retention across sessions. We evaluate our method on time-incremental, class-incremental, and lead-incremental scenarios using two widely used ECG arrhythmia datasets, Chapman and PTB-XL. The results demonstrate that DREAM-CL outperforms the state-of-the-art in CL for ECG arrhythmia detection. Detailed ablation and sensitivity studies are performed to validate the different design choices of our method.

* Accepted to 2025 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

Via

Access Paper or Ask Questions

Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training

Dec 16, 2024

Milad Soltany, Farhad Pourpanah, Mahdiyar Molahasani, Michael Greenspan, Ali Etemad

Figure 1 for Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training

Figure 2 for Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training

Figure 3 for Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training

Figure 4 for Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training

Abstract:In this paper, we propose a novel approach, Federated Domain Generalization with Label Smoothing and Balanced Decentralized Training (FedSB), to address the challenges of data heterogeneity within a federated learning framework. FedSB utilizes label smoothing at the client level to prevent overfitting to domain-specific features, thereby enhancing generalization capabilities across diverse domains when aggregating local models into a global model. Additionally, FedSB incorporates a decentralized budgeting mechanism which balances training among clients, which is shown to improve the performance of the aggregated global model. Extensive experiments on four commonly used multi-domain datasets, PACS, VLCS, OfficeHome, and TerraInc, demonstrate that FedSB outperforms competing methods, achieving state-of-the-art results on three out of four datasets, indicating the effectiveness of FedSB in addressing data heterogeneity.

Via

Access Paper or Ask Questions

Socially-Informed Reconstruction for Pedestrian Trajectory Forecasting

Dec 05, 2024

Haleh Damirchi, Ali Etemad, Michael Greenspan

Figure 1 for Socially-Informed Reconstruction for Pedestrian Trajectory Forecasting

Figure 2 for Socially-Informed Reconstruction for Pedestrian Trajectory Forecasting

Figure 3 for Socially-Informed Reconstruction for Pedestrian Trajectory Forecasting

Figure 4 for Socially-Informed Reconstruction for Pedestrian Trajectory Forecasting

Abstract:Pedestrian trajectory prediction remains a challenge for autonomous systems, particularly due to the intricate dynamics of social interactions. Accurate forecasting requires a comprehensive understanding not only of each pedestrian's previous trajectory but also of their interaction with the surrounding environment, an important part of which are other pedestrians moving dynamically in the scene. To learn effective socially-informed representations, we propose a model that uses a reconstructor alongside a conditional variational autoencoder-based trajectory forecasting module. This module generates pseudo-trajectories, which we use as augmentations throughout the training process. To further guide the model towards social awareness, we propose a novel social loss that aids in forecasting of more stable trajectories. We validate our approach through extensive experiments, demonstrating strong performances in comparison to state of-the-art methods on the ETH/UCY and SDD benchmarks.

* Accepted at Winter Conference on Applications of Computer Vision (WACV), 2025

Via

Access Paper or Ask Questions

CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis

Sep 30, 2024

Nishq Poorav Desai, Ali Etemad, Michael Greenspan

Figure 1 for CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis

Figure 2 for CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis

Figure 3 for CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis

Figure 4 for CycleCrash: A Dataset of Bicycle Collision Videos for Collision Prediction and Analysis

Abstract:Self-driving research often underrepresents cyclist collisions and safety. To address this, we present CycleCrash, a novel dataset consisting of 3,000 dashcam videos with 436,347 frames that capture cyclists in a range of critical situations, from collisions to safe interactions. This dataset enables 9 different cyclist collision prediction and classification tasks focusing on potentially hazardous conditions for cyclists and is annotated with collision-related, cyclist-related, and scene-related labels. Next, we propose VidNeXt, a novel method that leverages a ConvNeXt spatial encoder and a non-stationary transformer to capture the temporal dynamics of videos for the tasks defined in our dataset. To demonstrate the effectiveness of our method and create additional baselines on CycleCrash, we apply and compare 7 models along with a detailed ablation. We release the dataset and code at https://github.com/DeSinister/CycleCrash/ .

Via

Access Paper or Ask Questions

NapTune: Efficient Model Tuning for Mood Classification using Previous Night's Sleep Measures along with Wearable Time-series

Sep 07, 2024

Debaditya Shome, Nasim Montazeri Ghahjaverestan, Ali Etemad

Abstract:Sleep is known to be a key factor in emotional regulation and overall mental health. In this study, we explore the integration of sleep measures from the previous night into wearable-based mood recognition. To this end, we propose NapTune, a novel prompt-tuning framework that utilizes sleep-related measures as additional inputs to a frozen pre-trained wearable time-series encoder by adding and training lightweight prompt parameters to each Transformer layer. Through rigorous empirical evaluation, we demonstrate that the inclusion of sleep data using NapTune not only improves mood recognition performance across different wearable time-series namely ECG, PPG, and EDA, but also makes it more sample-efficient. Our method demonstrates significant improvements over the best baselines and unimodal variants. Furthermore, we analyze the impact of adding sleep-related measures on recognizing different moods as well as the influence of individual sleep-related measures.

* Accepted at ICMI 2024

Via

Access Paper or Ask Questions

Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Sep 04, 2024

Barys Liskavets, Maxim Ushakov, Shuvendu Roy, Mark Klibanov, Ali Etemad, Shane Luke

Figure 1 for Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Figure 2 for Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Figure 3 for Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Figure 4 for Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference

Abstract:Large language models (LLMs) have triggered a new stream of research focusing on compressing the context length to reduce the computational cost while ensuring the retention of helpful information for LLMs to answer the given question. Token-based removal methods are one of the most prominent approaches in this direction, but risk losing the semantics of the context caused by intermediate token removal, especially under high compression ratios, while also facing challenges in computational efficiency. In this work, we propose context-aware prompt compression (CPC), a sentence-level prompt compression technique where its key innovation is a novel context-aware sentence encoder that provides a relevance score for each sentence for a given question. To train this encoder, we generate a new dataset consisting of questions, positives, and negative pairs where positives are sentences relevant to the question, while negatives are irrelevant context sentences. We train the encoder in a contrastive setup to learn context-aware sentence representations. Our method considerably outperforms prior works on prompt compression on benchmark datasets and is up to 10.93x faster at inference compared to the best token-level compression method. We also find better improvement for shorter length constraints in most benchmarks, showing the effectiveness of our proposed solution in the compression of relevant information in a shorter context. Finally, we release the code and the dataset for quick reproducibility and further development: https://github.com/Workday/cpc.

Via

Access Paper or Ask Questions