Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Saurabh Kumar

An Extensible and Lightweight Unified Architecture for Demosaicing Pixel-bin Image Sensors

Jun 11, 2026

Saurabh Kumar, Nutan Sairam Yenneti

Abstract:Pixel-bin image sensors are becoming the default choice for smartphone cameras due to their resolution vs light-gathering trade-off. However, their larger inter-color separation compared to the Bayer color filter array (CFA) makes them challenging to demosaic. Furthermore, existing deep learning-based demosaicing methods are CFA-specific, requiring multiple individual models that take up precious onboard resources and demand larger development and maintenance efforts. In this work, we propose a modular unified architecture for demosaicing various pixel-bin sensors that provides higher image quality while being extensible and lightweight. Additionally, to enable plug-and-play operation, we introduce a learning-free CFA-identification module to detect the CFA type of raw data accurately.

Via

Access Paper or Ask Questions

VAANI: Capturing the language landscape for an inclusive digital India

Mar 31, 2026

Sujith Pulikodan, Abhayjeet Singh, Agneedh Basu, Nihar Desai, Pavan Kumar J, Pranav D Bhat, Raghu Dharmaraju, Ritika Gupta, Sathvik Udupa, Saurabh Kumar(+11 more)

Abstract:Project VAANI is an initiative to create an India-representative multi-modal dataset that comprehensively maps India's linguistic diversity, starting with 165 districts across the country in its first two phases. Speech data is collected through a carefully structured process that uses image-based prompts to encourage spontaneous responses. Images are captured through a separate process that encompasses a broad range of topics, gathered from both within and across districts. The collected data undergoes a rigorous multi-stage quality evaluation, including both automated and manual checks to ensure highest possible standards in audio quality and transcription accuracy. Following this thorough validation, we have open-sourced around 289K images, approximately 31,270 hours of audio recordings, and around 2,067 hours of transcribed speech, encompassing 112 languages from 165 districts from 31 States and Union territories. Notably, significant of these languages are being represented for the first time in a dataset of this scale, making the VAANI project a groundbreaking effort in preserving and promoting linguistic inclusivity. This data can be instrumental in building inclusive speech models for India, and in advancing research and development across speech, image, and multimodal applications.

Via

Access Paper or Ask Questions

Path-Following Guidance for Unmanned Aerial Vehicle with Bounded Lateral Acceleration

Mar 28, 2026

Vinay Kathiriya, Saurabh Kumar, Shashi Ranjan Kumar

Abstract:This paper addresses the three-dimensional path-following guidance problem for unmanned aerial vehicles under explicit actuator constraints. Unlike conventional approaches that assume unbounded control inputs or handle saturation heuristically, the proposed method incorporates bounded lateral acceleration directly into the guidance design. A nonlinear guidance framework is developed employing a nested saturation-based control technique. The proposed guidance strategy guarantees bounded control inputs while ensuring exponential convergence of cross-track errors to zero. The formulation is applicable to general smooth paths and is systematically extended from planar to three-dimensional scenarios using a path-tangent coordinate framework. Rigorous stability analysis based on Lyapunov theory establishes convergence and feasibility properties of the closed-loop system. Numerical simulations on representative paths, including straight-line, circular, and sinusoidal paths, demonstrate that the proposed method achieves superior tracking performance, reduced control effort, and robustness against disturbances compared to existing guidance laws. The simplicity of the design and its compatibility with practical actuator limits make it suitable for real-world UAV applications.

Via

Access Paper or Ask Questions

State-of-the-art Small Language Coder Model: Mify-Coder

Dec 26, 2025

Abhinav Parmar, Abhisek Panigrahi, Abhishek Kumar Dwivedi, Abhishek Bhattacharya, Adarsh Ramachandra, Aditya Choudhary, Aditya Garg, Aditya Raj, Alankrit Bhatt, Alpesh Yadav(+86 more)

Abstract:We present Mify-Coder, a 2.5B-parameter code model trained on 4.2T tokens using a compute-optimal strategy built on the Mify-2.5B foundation model. Mify-Coder achieves comparable accuracy and safety while significantly outperforming much larger baseline models on standard coding and function-calling benchmarks, demonstrating that compact models can match frontier-grade models in code generation and agent-driven workflows. Our training pipeline combines high-quality curated sources with synthetic data generated through agentically designed prompts, refined iteratively using enterprise-grade evaluation datasets. LLM-based quality filtering further enhances data density, enabling frugal yet effective training. Through disciplined exploration of CPT-SFT objectives, data mixtures, and sampling dynamics, we deliver frontier-grade code intelligence within a single continuous training trajectory. Empirical evidence shows that principled data and compute discipline allow smaller models to achieve competitive accuracy, efficiency, and safety compliance. Quantized variants of Mify-Coder enable deployment on standard desktop environments without requiring specialized hardware.

Via

Access Paper or Ask Questions

Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy

Jun 04, 2025

Neeraj Agrawal, Saurabh Kumar, Priyanka Bhatt, Tanishka Agarwal

Figure 1 for Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy

Figure 2 for Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy

Figure 3 for Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy

Figure 4 for Hierarchical Text Classification Using Contrastive Learning Informed Path Guided Hierarchy

Abstract:Hierarchical Text Classification (HTC) has recently gained traction given the ability to handle complex label hierarchy. This has found applications in domains like E- commerce, customer care and medicine industry among other real-world applications. Existing HTC models either encode label hierarchy separately and mix it with text encoding or guide the label hierarchy structure in the text encoder. Both approaches capture different characteristics of label hierarchy and are complementary to each other. In this paper, we propose a Hierarchical Text Classification using Contrastive Learning Informed Path guided hierarchy (HTC-CLIP), which learns hierarchy-aware text representation and text informed path guided hierarchy representation using contrastive learning. During the training of HTC-CLIP, we learn two different sets of class probabilities distributions and during inference, we use the pooled output of both probabilities for each class to get the best of both representations. Our results show that the two previous approaches can be effectively combined into one architecture to achieve improved performance. Tests on two public benchmark datasets showed an improvement of 0.99 - 2.37% in Macro F1 score using HTC-CLIP over the existing state-of-the-art models.

* ECAI 2023, pp. 19-26. IOS Press, 2023
* arXiv admin note: text overlap with arXiv:2203.03825 by other authors

Via

Access Paper or Ask Questions

Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Jun 04, 2025

Saurabh Kumar, Sourav Bansal, Neeraj Agrawal, Priyanka Bhatt

Figure 1 for Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Figure 2 for Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Figure 3 for Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Figure 4 for Building a Few-Shot Cross-Domain Multilingual NLU Model for Customer Care

Abstract:Customer care is an essential pillar of the e-commerce shopping experience with companies spending millions of dollars each year, employing automation and human agents, across geographies (like US, Canada, Mexico, Chile), channels (like Chat, Interactive Voice Response (IVR)), and languages (like English, Spanish). SOTA pre-trained models like multilingual-BERT, fine-tuned on annotated data have shown good performance in downstream tasks relevant to Customer Care. However, model performance is largely subject to the availability of sufficient annotated domain-specific data. Cross-domain availability of data remains a bottleneck, thus building an intent classifier that generalizes across domains (defined by channel, geography, and language) with only a few annotations, is of great practical value. In this paper, we propose an embedder-cum-classifier model architecture which extends state-of-the-art domain-specific models to other domains with only a few labeled samples. We adopt a supervised fine-tuning approach with isotropic regularizers to train a domain-specific sentence embedder and a multilingual knowledge distillation strategy to generalize this embedder across multiple domains. The trained embedder, further augmented with a simple linear classifier can be deployed for new domains. Experiments on Canada and Mexico e-commerce Customer Care dataset with few-shot intent detection show an increase in accuracy by 20-23% against the existing state-of-the-art pre-trained models.

* ECAI 2023. IOS Press, 2023. 3212-3217

Via

Access Paper or Ask Questions

Conformal Transformations for Symmetric Power Transformers

Mar 05, 2025

Saurabh Kumar, Jacob Buckman, Carles Gelada, Sean Zhang

Abstract:Transformers with linear attention offer significant computational advantages over softmax-based transformers but often suffer from degraded performance. The symmetric power (sympow) transformer, a particular type of linear transformer, addresses some of this performance gap by leveraging symmetric tensor embeddings, achieving comparable performance to softmax transformers. However, the finite capacity of the recurrent state in sympow transformers limits their ability to retain information, leading to performance degradation when scaling the training or evaluation context length. To address this issue, we propose the conformal-sympow transformer, which dynamically frees up capacity using data-dependent multiplicative gating and adaptively stores information using data-dependent rotary embeddings. Preliminary experiments on the LongCrawl64 dataset demonstrate that conformal-sympow overcomes the limitations of sympow transformers, achieving robust performance across scaled training and evaluation contexts.

* SCOPE Workshop at ICLR 2025

Via

Access Paper or Ask Questions

Three-dimensional Nonlinear Path-following Guidance with Bounded Input Constraints

Sep 13, 2024

Saurabh Kumar, Shashi Ranjan Kumar, Abhinav Sinha

Abstract:In this paper, we consider the tracking of arbitrary curvilinear geometric paths in three-dimensional output spaces of unmanned aerial vehicles (UAVs) without pre-specified timing requirements, commonly referred to as path-following problems, subjected to bounded inputs. Specifically, we propose a novel nonlinear path-following guidance law for a UAV that enables it to follow any smooth curvilinear path in three dimensions while accounting for the bounded control authority in the design. The proposed solution offers a general treatment of the path-following problem by removing the dependency on the path's geometry, which makes it applicable to paths with varying levels of complexity and smooth curvatures. Additionally, the proposed strategy draws inspiration from the pursuit guidance approach, which is known for its simplicity and ease of implementation. Theoretical analysis guarantees that the UAV converges to its desired path within a fixed time and remains on it irrespective of its initial configuration with respect to the path. Finally, the simulations demonstrate the merits and effectiveness of the proposed guidance strategy through a wide range of engagement scenarios, showcasing the UAV's ability to follow diverse curvilinear paths accurately.

Via

Access Paper or Ask Questions

The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Aug 06, 2024

Saurabh Kumar, Hong Jun Jeon, Alex Lewandowski, Benjamin Van Roy

Figure 1 for The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Figure 2 for The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Figure 3 for The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Figure 4 for The Need for a Big World Simulator: A Scientific Challenge for Continual Learning

Abstract:The "small agent, big world" frame offers a conceptual view that motivates the need for continual learning. The idea is that a small agent operating in a much bigger world cannot store all information that the world has to offer. To perform well, the agent must be carefully designed to ingest, retain, and eject the right information. To enable the development of performant continual learning agents, a number of synthetic environments have been proposed. However, these benchmarks suffer from limitations, including unnatural distribution shifts and a lack of fidelity to the "small agent, big world" framing. This paper aims to formalize two desiderata for the design of future simulated environments. These two criteria aim to reflect the objectives and complexity of continual learning in practical settings while enabling rapid prototyping of algorithms on a smaller scale.

* Accepted to the Finding the Frame Workshop at RLC 2024

Via

Access Paper or Ask Questions

Satisficing Exploration for Deep Reinforcement Learning

Jul 16, 2024

Dilip Arumugam, Saurabh Kumar, Ramki Gummadi, Benjamin Van Roy

Figure 1 for Satisficing Exploration for Deep Reinforcement Learning

Figure 2 for Satisficing Exploration for Deep Reinforcement Learning

Figure 3 for Satisficing Exploration for Deep Reinforcement Learning

Figure 4 for Satisficing Exploration for Deep Reinforcement Learning

Abstract:A default assumption in the design of reinforcement-learning algorithms is that a decision-making agent always explores to learn optimal behavior. In sufficiently complex environments that approach the vastness and scale of the real world, however, attaining optimal performance may in fact be an entirely intractable endeavor and an agent may seldom find itself in a position to complete the requisite exploration for identifying an optimal policy. Recent work has leveraged tools from information theory to design agents that deliberately forgo optimal solutions in favor of sufficiently-satisfying or satisficing solutions, obtained through lossy compression. Notably, such agents may employ fundamentally different exploratory decisions to learn satisficing behaviors more efficiently than optimal ones that are more data intensive. While supported by a rigorous corroborating theory, the underlying algorithm relies on model-based planning, drastically limiting the compatibility of these ideas with function approximation and high-dimensional observations. In this work, we remedy this issue by extending an agent that directly represents uncertainty over the optimal value function allowing it to both bypass the need for model-based planning and to learn satisficing policies. We provide simple yet illustrative experiments that demonstrate how our algorithm enables deep reinforcement-learning agents to achieve satisficing behaviors. In keeping with previous work on this setting for multi-armed bandits, we additionally find that our algorithm is capable of synthesizing optimal behaviors, when feasible, more efficiently than its non-information-theoretic counterpart.

* Accepted to the Finding the Frame Workshop at RLC 2024

Via

Access Paper or Ask Questions