Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vansh Agrawal

More Than a Quick Glance: Overcoming the Greedy Bias in KV-Cache Compression

Feb 02, 2026

Aryan Sood, Tanvi Sharma, Vansh Agrawal

Abstract:While Large Language Models (LLMs) can theoretically support extensive context windows, their actual deployment is constrained by the linear growth of Key-Value (KV) cache memory. Prevailing compression strategies mitigate this through various pruning mechanisms, yet trade-off semantic recall for memory efficiency. In this work, we present LASER-KV (Layer Accumulated Selection with Exact-LSH Recall), a framework designed to test the limits of KV compression under a strict accumulative budgeting policy. We deviate from the standard fixed summary size approach by implementing a block-wise accumulation strategy governed by a protection divisor (n). This allows us to isolate the effects of compression from sliding window artifacts. Our experiments on the Babilong benchmark reveal performance degradation in previous compression methods by 15-30% on various long context tasks. LASER-KV maintains stable performance, achieving superior accuracies by a margin of upto 10% at 128k. These findings challenge the prevailing assumption that attention scores alone are a sufficient proxy for token utility.

Via

Access Paper or Ask Questions

Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Oct 08, 2024

Ayush Singh, Mansi Gupta, Shivank Garg, Abhinav Kumar, Vansh Agrawal

Figure 1 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Figure 2 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Figure 3 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Figure 4 for Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning

Abstract:Vision-Language Models (VLMs) have transformed tasks requiring visual and reasoning abilities, such as image retrieval and Visual Question Answering (VQA). Despite their success, VLMs face significant challenges with tasks involving geometric reasoning, algebraic problem-solving, and counting. These limitations stem from difficulties effectively integrating multiple modalities and accurately interpreting geometry-related tasks. Various works claim that introducing a captioning pipeline before VQA tasks enhances performance. We incorporated this pipeline for tasks involving geometry, algebra, and counting. We found that captioning results are not generalizable, specifically with larger VLMs primarily trained on downstream QnA tasks showing random performance on math-related challenges. However, we present a promising alternative: task-based prompting, enriching the prompt with task-specific guidance. This approach shows promise and proves more effective than direct captioning methods for math-heavy problems.

Via

Access Paper or Ask Questions

Give me a hint: Can LLMs take a hint to solve math problems?

Oct 08, 2024

Vansh Agrawal, Pratham Singla, Amitoj Singh Miglani, Shivank Garg, Ayush Mangal

Figure 1 for Give me a hint: Can LLMs take a hint to solve math problems?

Figure 2 for Give me a hint: Can LLMs take a hint to solve math problems?

Figure 3 for Give me a hint: Can LLMs take a hint to solve math problems?

Figure 4 for Give me a hint: Can LLMs take a hint to solve math problems?

Abstract:While many state-of-the-art LLMs have shown poor logical and basic mathematical reasoning, recent works try to improve their problem-solving abilities using prompting techniques. We propose giving "hints" to improve the language model's performance on advanced mathematical problems, taking inspiration from how humans approach math pedagogically. We also test the model's adversarial robustness to wrong hints. We demonstrate the effectiveness of our approach by evaluating various LLMs, presenting them with a diverse set of problems of different difficulties and topics from the MATH dataset and comparing against techniques such as one-shot, few-shot, and chain of thought prompting.

Via

Access Paper or Ask Questions