Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Deepak Kumar

Rethinking Passive RIS: Finite Blocklength Reliability Analysis Under Thermal Noise

May 20, 2026

Farjam Karim, Deepak Kumar, Prathapasinghe Dharmawansa, Nurul Huda Mahmood, Arthur Sousa de Sena, Matti Latva-aho

Abstract:Short-packet communication alters the fundamental performance limits of reconfigurable intelligent surface (RIS)-assisted systems, making conventional analyses based on the infinite blocklength regime insufficient. This work investigates RIS-assisted transmission in the finite blocklength (FBL) regime while explicitly incorporating thermal noise generated by passive RIS elements, an effect commonly neglected in existing models. A unified analytical framework is developed to characterize the block-error rate (BLER), its asymptotic behavior, and the resulting goodput under both uniform and non-uniform RIS reflection coefficients. Our results show that ignoring RIS thermal noise leads to a pronounced overestimation of reliability with the mismatch increasing as the number of reflecting elements grows. Furthermore, increasing the RIS size does not always improve performance, particularly in the low transmit power regime where accumulated noise becomes dominant. Overall, the results highlight fundamental limitations of idealized RIS models and demonstrate the need for incorporating thermal noise for accurate system evaluation.

Via

Access Paper or Ask Questions

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

May 13, 2026

Anuj Sadani, Deepak Kumar

Abstract:Personally Identifiable Information (PII) redaction usually replaces detected entities with placeholder tokens such as [PERSON], destroying the downstream utility of the redacted text for retrieval and Named Entity Recognition (NER) training. We propose a fully on-device pipeline that substitutes PII with consistent, type-preserving fake values: a 1.5 B mixture-of-experts token classifier (openai/privacy-filter) detects spans, a 1-bit Bonsai-1.7B Small Language Model (SLM) proposes contextual surrogates for names, addresses, and dates, and a rule-based generator (faker) handles patterned fields. We report a prompting finding more important than the quantization choice: with naive fixed three-shot demonstrations, the 1-bit SLM regurgitates demonstration outputs verbatim regardless of input; 1.58-bit Ternary-Bonsai-1.7B reproduces byte-identical failures, ruling out quantization as the cause. We fix this with locale-conditioned rotating few-shot demonstrations: a character-range heuristic picks a locale-pure pool and a per-input MD5 hash samples three demonstrations. With the fix, 482/482 unique Bonsai-1.7B calls succeed (no echoes) and produce locale-correct surrogates, although the SLM still copies from a small same-locale demonstration pool - a residual narrowness we quantify. On a 2000-document multilingual corpus, hybrid perplexity (PPL) beats faker in all six locales under a multilingual evaluator (XGLM-564M); length preservation is best-of-three in 4 of 6 locales. On downstream NER (400 train / 100 test, English), redact yields F1=0.000, faker 0.656, original 0.960; on a matched 160/40 subset including hybrid, faker (0.506) outperforms hybrid (0.346) at p < 0.001. We report this as an honest negative finding: SLM surrogates produce more natural text but a less varied training distribution, and downstream NER benefits more from variety than from naturalness.

* 15 pages

Via

Access Paper or Ask Questions

Mind the Pause: Disfluency-Aware Objective Tuning for Multilingual Speech Correction with LLMs

May 12, 2026

Deepak Kumar, Baban Gain, Asif Ekbal

Abstract:Automatic Speech Recognition (ASR) transcripts often contain disfluencies, such as fillers, repetitions, and false starts, which reduce readability and hinder downstream applications like chatbots and voice assistants. If left unaddressed, such disfluencies can significantly degrade the reliability of downstream systems. Most existing approaches rely on classical models that focus on identifying disfluent tokens for removal. While this strategy is effective to some extent, it often disrupts grammatical structure and semantic coherence, leading to incomplete or unnatural sentences. Recent literature explored the use of large language models (LLMs); however, these efforts have primarily focused on disfluency detection or data augmentation, rather than performing comprehensive correction. We propose a multilingual correction pipeline where a sequence tagger first marks disfluent tokens, and these signals guide instruction fine-tuning of an LLM to rewrite transcripts into fluent text. To further improve reliability, we add a contrastive learning objective that penalizes the reproduction of disfluent tokens, encouraging the model to preserve grammar and meaning while removing disfluent artifacts. Our experiments across three Indian languages, namely Hindi, Bengali, and Marathi show consistent improvements over strong baselines, including multilingual sequence-to-sequence models. These results highlight that detection-only strategies are insufficient. Combining token-level cues with instruction tuning and contrastive learning provides a practical and scalable solution for multilingual disfluency correction in speech-driven NLP systems. We make the codes publicly available at https://github.com/deepak-kumar-98/Mind-the-Pause.

* Accepted to ACL 2026 (Main)

Via

Access Paper or Ask Questions

Evaluating CUDA Tile for AI Workloads on Hopper and Blackwell GPUs

Apr 25, 2026

Divakar Kumar Yadav, Tian Zhao, Deepak Kumar

Abstract:NVIDIA's CUDA Tile (CuTile) introduces a Python-based, tile-centric abstraction for GPU kernel development that aims to simplify programming while retaining Tensor Core and Tensor Memory Accelerator (TMA) efficiency on modern GPUs. We present the first independent, cross-architecture evaluation of CuTile against established approaches such as cuBLAS, Triton, WMMA, and raw SIMT on three NVIDIA GPUs spanning Hopper and Blackwell: H100 NVL, B200, and RTX PRO 6000 Blackwell Server Edition. We benchmark representative AI workloads, including GEMM, fused multi-head attention, and end-to-end LLM inference in BF16/FP16 precision, to assess both performance and portability. Our results show that CuTile effectiveness is strongly workload- and architecture-dependent. On datacenter-class Blackwell (B200), CuTile achieves up to 1007 TFLOP/s for fused attention, outperforming FlashAttention-2 by 2.5x while requiring only 60 lines of Python kernel code. For GEMM, CuTile reaches 52-79% of cuBLAS performance in 22 lines of code (versus 123 for WMMA), making it a practical replacement for hand-written CUDA kernels but not yet for vendor-optimized libraries. However, the same CuTile attention kernel achieves only 53% of FlashAttention-2 throughput on RTX PRO 6000 (sm_120), exposing significant cross-architecture optimization gaps. In contrast, Triton sustains 62-101% of cuBLAS performance across all tested platforms without architecture-specific tuning, demonstrating substantially stronger portability.

Via

Access Paper or Ask Questions

Tool Attention Is All You Need: Dynamic Tool Gating and Lazy Schema Loading for Eliminating the MCP/Tools Tax in Scalable Agentic Workflows

Apr 23, 2026

Anuj Sadani, Deepak Kumar

Abstract:The Model Context Protocol (MCP) has become a common interface for connecting large language model (LLM) agents to external tools, but its reliance on stateless, eager schema injection imposes a hidden per-turn overhead the MCP Tax or Tools Tax that practitioner reports place between roughly 10k and 60k tokens in typical multi-server deployments. This payload inflates the key-value cache, is associated with reasoning degradation as context utilization approaches published fracture points around 70%, and turns token budgets into a recurring operational cost. We introduce Tool Attention, a middleware-layer mechanism that generalizes the "Attention Is All You Need" paradigm from self-attention over tokens to gated attention over tools. Tool Attention combines (i) an Intent Schema Overlap (ISO) score from sentence embeddings, (ii) a state-aware gating function enforcing preconditions and access scopes, and (iii) a two-phase lazy schema loader that keeps a compact summary pool in context and promotes full JSON schemas only for top-k gated tools. We evaluate on a simulated 120-tool, six-server benchmark whose per-server token counts are calibrated to public audits of real MCP deployments. In this simulation, Tool Attention directly reduces measured per-turn tool tokens by 95.0% (47.3k -> 2.4k) and raises effective context utilization (a token-ratio quantity) from 24% to 91%. End-to-end figures for task success, latency, cost, and reasoning quality are reported as projections derived from the measured token counts combined with published deployment telemetry; they are not measured on live LLM agents, and we mark projected values explicitly throughout. Taken together, the results support a simple thesis: protocol-level efficiency, not raw context length, is a binding constraint on scalable gentic systems. The code for this work is accessible at https://github.com/asadani/tool-attention

* 21 pages

Via

Access Paper or Ask Questions

Infection-Reasoner: A Compact Vision-Language Model for Wound Infection Classification with Evidence-Grounded Clinical Reasoning

Apr 21, 2026

Palawat Busaranuvong, Reza Saadati Fard, Emmanuel Agu, Deepak Kumar, Shefalika Gautam, Bengisu Tulu, Diane Strong

Abstract:Assessing chronic wound infection from photographs is challenging because visual appearance varies across wound etiologies, anatomical locations, and imaging conditions. Prior image-based deep learning methods have mainly focused on classification with limited interpretability, despite the need for evidence-grounded explanations to support point-of-care decision making. We present Infection-Reasoner, a compact 4B-parameter reasoning vision-language model for chronic wound infection classification and rationale generation. To address the scarcity of expert-labeled wound images with reasoning annotations, Infection-Reasoner is trained using a two-stage pipeline: (1) reasoning distillation, in which GPT-5.1 generates chain-of-thought rationales for unlabeled wound images to initialize wound-specific reasoning in a smaller student model (Qwen3-VL-4B-Thinking), and (2) reinforcement learning post-training with Group Relative Policy Optimization on a small labeled infection dataset to refine classification reasoning. On a held-out heterogeneous wound dataset, Infection-Reasoner achieved 86.8\% accuracy, 86.4\% sensitivity, and 87.1\% specificity, outperforming several strong baselines, including GPT-5.1. Rationale quality was further evaluated using both multimodal large language model (MLLM) judges and wound expert review. Across four MLLM judges, visual-support agreement scores ranged from 0.722 to 0.903, while expert review rated 61.8\% of rationales as Correct and 32.4\% as Partially Correct.

Via

Access Paper or Ask Questions

Impact of CSIR, SIC, and Hardware Impairments on the Ergodic Rate of Downlink RSMA

Apr 20, 2026

Farjam Karim, Deepak Kumar, Prathapasinghe Dharmawansa, Nurul Huda Mahmood, Arthur Sousa de Sena, Matti Latva-aho

Abstract:This work investigates the ergodic rate performance analysis of rate-splitting multiple access (RSMA) in a downlink communication system under practical impairments. Closed-form expressions are derived for key performance metrics such as ergodic rate, energy efficiency, sum-rate, and Jains fairness index, capturing the joint effects of imperfect channel state information at the receiver (CSIR), imperfect successive interference cancellation (SIC), and hardware impairments. Numerical simulations validate the accuracy of the analytical expressions and reveal several insightful trends. At low transmit powers, imperfect CSIR is the dominant performance-limiting factor, followed by hardware impairments and imperfect SIC. However, as the transmit power increases, hardware impairments become the primary bottleneck, with the impact of imperfect CSIR gradually diminishing, and imperfect SIC becoming a more prominent bottleneck. Moreover, RSMA consistently outperforms non-orthogonal multiple access (NOMA) in terms of ergodic rate, fairness, and sum-rate, even under severe non-idealities. These findings underscore the importance of incorporating fairness as a core design objective alongside rate and energy efficiency, positioning RSMA as a robust and strong multiple access candidate for next-generation wireless networks.

Via

Access Paper or Ask Questions

Passive RIS Is Not Silent: Revisiting Performance Limits Under Thermal Noise

Apr 20, 2026

Farjam Karim, Deepak Kumar, Prathapasinghe Dharmawansa, Nurul Huda Mahmood, Arthur Sousa de Sena, Matti-Latva-aho

Abstract:Reconfigurable intelligent surfaces (RISs) have emerged as a promising solution for enabling energy-efficient and flexible spectrum usage in wireless communication, particularly in the context of sixth-generation (6G) networks. While passive RIS architectures are widely regarded as virtually noiseless due to the lack of active components, this idealized assumption can lead to misleading performance evaluations. In this paper, we revisit this assumption and demonstrate that the thermal noise generated by passive RIS elements, though often neglected, can significantly affect system performance. We propose a tractable approximated analytical framework that incorporates RIS-induced thermal noise into the system and derive closed-form expressions for key performance metrics, such as outage probability and throughput. Simulation results validate our approximated analysis and highlight the substantial performance discrepancies that arise when RIS thermal noise is ignored. Our results offer valuable insights into the trade-offs between receiver and RIS noise, guiding the development of robust and efficient 6G communication systems.

Via

Access Paper or Ask Questions

RSMA-Aided Full-Duplex Networks Under Imperfect CSI and SIC: Performance Evaluation

Apr 20, 2026

Farjam Karim, Nurul Huda Mahmood, Deepak Kumar, Arthur Sousa de Sena, Matti-Latva-aho

Abstract:This work investigates a full-duplex (FD)-enhanced Rate-Splitting Multiple Access (RSMA) system under practical constraints, including imperfect channel state information (CSI) and successive interference cancellation (SIC). We derive closed-form expressions for key performance metrics, such as outage probability and throughput, for both uplink and downlink users. The analysis considers co-channel interference (CCI) from uplink to downlink users and models the self-interference (SI) channel as a random variable. Monte Carlo simulations validate the analytical results and highlight the impact of system imperfections on RSMA-FD performance. At low transmit power, imperfect CSI significantly affects the system, though this effect weakens as power increases. In contrast, imperfect SIC becomes more detrimental at high transmit power, causing severe degradation. Additionally, neglecting CCI and assuming perfect SI cancellation leads to substantial overestimation of performance. Lastly, we demonstrate that the SI cancellation factor must be carefully selected to suppress interference effectively. Otherwise, a poor choice limits the full potential of FD technology.

Via

Access Paper or Ask Questions

SWE-PRBench: Benchmarking AI Code Review Quality Against Pull Request Feedback

Mar 27, 2026

Deepak Kumar

Abstract:We introduce SWE-PRBench, a benchmark of 350 pull requests with human-annotated ground truth for evaluating AI code review quality. Evaluated against an LLM-as-judge framework validated at kappa=0.75, 8 frontier models detect only 15-31% of human-flagged issues on the diff-only configuration, demonstrating that AI code review remains far below human expert performance despite strong results on code generation benchmarks. Pull requests are drawn from active open-source repositories, filtered from 700 candidates using a Repository Quality Score, and evaluated under three frozen context configurations: diff only (config_A), diff with file content (config_B), and full context (config_C), enabling systematic ablation of context provision strategies. All 8 models degrade monotonically from config_A to config_C, even when context is provided via structured semantic layers including AST-extracted function context and import graph resolution. The dominant mechanism is a collapse of Type2_Contextual issue detection at config_B, consistent with attention dilution in long contexts: a structured 2,000-token diff-with-summary prompt outperforms a 2,500-token full-context prompt enriched with execution context, behaviour mapping, and test signatures across all 8 models. The top four models are statistically indistinguishable (mean score 0.147-0.153) while a clear tier gap separates them from the remaining four (mean score <= 0.113). Dataset, contexts, annotations, and evaluation harness are released publicly.

Via

Access Paper or Ask Questions