Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

"Time": models, code, and papers

RAGE for the Machine: Image Compression with Low-Cost Random Access for Embedded Applications

Feb 07, 2024
Christian D. Rask, Daniel E. Lucani

We introduce RAGE, an image compression framework that achieves four generally conflicting objectives: 1) good compression for a wide variety of color images, 2) computationally efficient, fast decompression, 3) fast random access of images with pixel-level granularity without the need to decompress the entire image, 4) support for both lossless and lossy compression. To achieve these, we rely on the recent concept of generalized deduplication (GD), which is known to provide efficient lossless (de)compression and fast random access in time-series data, and deliver key expansions suitable for image compression, both lossless and lossy. Using nine different datasets, incl. graphics, logos, natural images, we show that RAGE has similar or better compression ratios to state-of-the-art lossless image compressors, while delivering pixel-level random access capabilities. Tests in an ARM Cortex-M33 platform show seek times between 9.9 and 40.6~ns and average decoding time per pixel between 274 and 1226~ns. Our measurements also show that RAGE's lossy variant, RAGE-Q, outperforms JPEG by several fold in terms of distortion in embedded graphics and has reasonable compression and distortion for natural images.

* 7 pages, submitted, 10 figures, submitted to IEEE International Conference on Image Processing (ICIP)

Via

Access Paper or Ask Questions

The I/O Complexity of Attention, or How Optimal is Flash Attention?

Feb 12, 2024
Barna Saha, Christopher Ye

Self-attention is at the heart of the popular Transformer architecture, yet suffers from quadratic time and memory complexity. The breakthrough FlashAttention algorithm revealed I/O complexity as the true bottleneck in scaling Transformers. Given two levels of memory hierarchy, a fast cache (e.g. GPU on-chip SRAM) and a slow memory (e.g. GPU high-bandwidth memory), the I/O complexity measures the number of accesses to memory. FlashAttention computes attention using $\frac{N^2d^2}{M}$ I/O operations where $N$ is the dimension of the attention matrix, $d$ the head-dimension and $M$ the cache size. However, is this I/O complexity optimal? The known lower bound only rules out an I/O complexity of $o(Nd)$ when $M=\Theta(Nd)$, since the output that needs to be written to slow memory is $\Omega(Nd)$. This leads to the main question of our work: Is FlashAttention I/O optimal for all values of $M$? We resolve the above question in its full generality by showing an I/O complexity lower bound that matches the upper bound provided by FlashAttention for any values of $M \geq d^2$ within any constant factors. Further, we give a better algorithm with lower I/O complexity for $M < d^2$, and show that it is optimal as well. Moreover, our lower bounds do not rely on using combinatorial matrix multiplication for computing the attention matrix. We show even if one uses fast matrix multiplication, the above I/O complexity bounds cannot be improved. We do so by introducing a new communication complexity protocol for matrix compression, and connecting communication complexity to I/O complexity. To the best of our knowledge, this is the first work to establish a connection between communication complexity and I/O complexity, and we believe this connection could be of independent interest and will find many more applications in proving I/O complexity lower bounds in the future.

* 24 pages, 3 figures

Via

Access Paper or Ask Questions

A Scalable Algorithm for Individually Fair K-means Clustering

Feb 09, 2024
MohammadHossein Bateni, Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi

We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al. Given $n$ points $P$ in a metric space, let $\delta(x)$ for $x\in P$ be the radius of the smallest ball around $x$ containing at least $n / k$ points. A clustering is then called individually fair if it has centers within distance $\delta(x)$ of $x$ for each $x\in P$. While good approximation algorithms are known for this problem no efficient practical algorithms with good theoretical guarantees have been presented. We design the first fast local-search algorithm that runs in ~$O(nk^2)$ time and obtains a bicriteria $(O(1), 6)$ approximation. Then we show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.

* 32 pages, 2 figures, to appear at the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024

Via

Access Paper or Ask Questions

V-STaR: Training Verifiers for Self-Taught Reasoners

Feb 09, 2024
Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming, we propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier using DPO that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models.

Via

Access Paper or Ask Questions

Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance

Feb 10, 2024
Raza Imam, Muhammad Huzaifa, Nabil Mansour, Shaher Bano Mirza, Fouad Lamghari

In this study, we propose an automated framework for camel farm monitoring, introducing two key contributions: the Unified Auto-Annotation framework and the Fine-Tune Distillation framework. The Unified Auto-Annotation approach combines two models, GroundingDINO (GD), and Segment-Anything-Model (SAM), to automatically annotate raw datasets extracted from surveillance videos. Building upon this foundation, the Fine-Tune Distillation framework conducts fine-tuning of student models using the auto-annotated dataset. This process involves transferring knowledge from a large teacher model to a student model, resembling a variant of Knowledge Distillation. The Fine-Tune Distillation framework aims to be adaptable to specific use cases, enabling the transfer of knowledge from the large models to the small models, making it suitable for domain-specific applications. By leveraging our raw dataset collected from Al-Marmoom Camel Farm in Dubai, UAE, and a pre-trained teacher model, GroundingDINO, the Fine-Tune Distillation framework produces a lightweight deployable model, YOLOv8. This framework demonstrates high performance and computational efficiency, facilitating efficient real-time object detection. Our code is available at \href{https://github.com/Razaimam45/Fine-Tune-Distillation}{https://github.com/Razaimam45/Fine-Tune-Distillation}

Via

Access Paper or Ask Questions

Risk assessment and observation of driver with pedestrian using instantaneous heart rate and HRV

Feb 10, 2024
Riku Kikuta, Daniel Carruth, John Ball, Reuben Burch, Ichiro Kageyama

Currently, human drivers outperform self-driving vehicles in many conditions such as collision avoidance. Therefore, understanding human driver behaviour in these conditions will provide insight for future autonomous vehicles. For understanding driver behaviour, risk assessment is applied so far as one of the approaches by using both subjective and objective measurement. Subjective measurement methods such as questionnaires may provide insight into driver risk assessment but there is often significant variability between drivers.Physiological measurements such as heart rate (HR), electroencephalogram (EEG), and electromyogram (EMG) provide more objective measurements of driver risk assessment. HR is often used for measuring driver risk assessment based on observed correlations between HR and risk perception. Previous work has used HR to measure driver risk assessment in self-driving systems, but pedestrian dynamics is not considered for the research. In this study, we observed driver behaviour in certain scenarios which have pedestrian on driving simulator. The scenarios have safe/unsafe situations (i.e., pedestrian crosses road and vehicle may hit pedestrian in one scenario), HR analysis in time/frequency domain is processed for risk assessment. As a result, HR analysis in frequency domain shows certain reasonability for driver risk assessment when driver has pedestrian in its traffic.

* 2023 AHFE Open Access, vol 95

Via

Access Paper or Ask Questions

Understanding the Training Speedup from Sampling with Approximate Losses

Feb 10, 2024
Rudrajit Das, Xi Chen, Bertram Ieong, Parikshit Bansal, Sujay Sanghavi

It is well known that selecting samples with large losses/gradients can significantly reduce the number of training steps. However, the selection overhead is often too high to yield any meaningful gains in terms of overall training time. In this work, we focus on the greedy approach of selecting samples with large \textit{approximate losses} instead of exact losses in order to reduce the selection overhead. For smooth convex losses, we show that such a greedy strategy can converge to a constant factor of the minimum value of the average loss in fewer iterations than the standard approach of random selection. We also theoretically quantify the effect of the approximation level. We then develop SIFT which uses early exiting to obtain approximate losses with an intermediate layer's representations for sample selection. We evaluate SIFT on the task of training a 110M parameter 12-layer BERT base model and show significant gains (in terms of training hours and number of backpropagation steps) without any optimized implementation over vanilla training. For e.g., to reach 64% validation accuracy, SIFT with exit at the first layer takes ~43 hours compared to ~57 hours of vanilla training.

Via

Access Paper or Ask Questions

Neural Rearrangement Planning for Object Retrieval from Confined Spaces Perceivable by Robot's In-hand RGB-D Sensor

Feb 10, 2024
Hanwen Ren, Ahmed H. Qureshi

Rearrangement planning for object retrieval tasks from confined spaces is a challenging problem, primarily due to the lack of open space for robot motion and limited perception. Several traditional methods exist to solve object retrieval tasks, but they require overhead cameras for perception and a time-consuming exhaustive search to find a solution and often make unrealistic assumptions, such as having identical, simple geometry objects in the environment. This paper presents a neural object retrieval framework that efficiently performs rearrangement planning of unknown, arbitrary objects in confined spaces to retrieve the desired object using a given robot grasp. Our method actively senses the environment with the robot's in-hand camera. It then selects and relocates the non-target objects such that they do not block the robot path homotopy to the target object, thus also aiding an underlying path planner in quickly finding robot motion sequences. Furthermore, we demonstrate our framework in challenging scenarios, including real-world cabinet-like environments with arbitrary household objects. The results show that our framework achieves the best performance among all presented methods and is, on average, two orders of magnitude computationally faster than the best-performing baselines.

* Accepted in IEEE/RAS ICRA'24

Via

Access Paper or Ask Questions

RIS-Enhanced Cognitive Integrated Sensing and Communication: Joint Beamforming and Spectrum Sensing

Feb 10, 2024
Yongqing Xu, Yong Li, Tony Q. S. Quek

Cognitive radio (CR) and integrated sensing and communication (ISAC) are both critical technologies for the sixth generation (6G) wireless networks. However, their interplay has yet to be explored. To obtain the mutual benefits between CR and ISAC, we focus on a reconfigurable intelligent surface (RIS)-enhanced cognitive ISAC system and explore using the additional degrees-of-freedom brought by the RIS to improve the performance of the cognitive ISAC system. Specifically, we formulate an optimization problem of maximizing the signal-to-noise-plus-interference ratios (SINRs) of the mobile sensors (MSs) while ensuring the requirements of the spectrum sensing (SS) and the secondary transmissions by jointly designing the SS time, the secondary base station (SBS) beamforming, and the RIS beamforming. The formulated non-convex problem can be solved by the proposed block coordinate descent (BCD) algorithm based on the Dinkelbach's transform and the successive convex approximation (SCA) methods. Simulation results demonstrate that the proposed scheme exhibits good convergence performance and can effectively reduce the position error bounds (PEBs) of the MSs, thereby improving the radio environment map (REM) accuracy of CR networks. Additionally, we reveal the impact of RIS deployment locations on the performance of cognitive ISAC systems.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Multi-User SR-LDPC Codes via Coded Demixing with Applications to Cell-Free Systems

Feb 10, 2024
Jamison R. Ebert, Jean-Francois Chamberland, Krishna R. Narayanan

Novel sparse regression LDPC (SR-LDPC) codes exhibit excellent performance over additive white Gaussian noise (AWGN) channels in part due to their natural provision of shaping gains. Though SR-LDPC-like codes have been considered within the context of single-user error correction and massive random access, they are yet to be examined as candidates for coordinated multi-user communication scenarios. This article explores this gap in the literature and demonstrates that SR-LDPC codes, when combined with coded demixing techniques, offer a new framework for efficient non-orthogonal multiple access (NOMA) in the context of coordinated multi-user communication channels. The ensuing communication scheme is referred to as MU-SR-LDPC coding. Empirical evidence suggests that, for a fixed SNR, MU-SR-LDPC coding can achieve a target bit error rate (BER) at a higher sum rate than orthogonal multiple access (OMA) techniques such as time division multiple access (TDMA) and frequency division multiple access (FDMA). Importantly, MU-SR-LDPC codes enable a pragmatic solution path for user-centric cell-free communication systems with (local) joint decoding. Results are supported by numerical simulations.

* Submitted to ISIT 2024

Via

Access Paper or Ask Questions