Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Senthil Kumar

Decoupling Identity from Utility: Privacy-by-Design Frameworks for Financial Ecosystems

Apr 16, 2026

Ifayoyinsola Ibikunle, Tyler Farnan, Senthil Kumar, Mayana Pereira

Abstract:Financial institutions face tension between maximizing data utility and mitigating the re-identification risks inherent in traditional anonymization methods. This paper explores Differentially Private (DP) synthetic data as a robust "Privacy by Design" framework to resolve this conflict, ensuring output privacy while satisfying stringent regulatory obligations. We examine two distinct generative paradigms: Direct Tabular Synthesis, which reconstructs high-fidelity joint distributions from raw data, and DP-Seeded Agent-Based Modeling (ABM), which uses DP-protected aggregates to parameterize complex, stateful simulations. While tabular synthesis excels at reflecting static historical correlations for QA testing and business analytics, the DP-Seeded ABM offers a forward-looking "counterfactual laboratory" capable of modeling dynamic market behaviors and black swan events. By decoupling individual identities from data utility, these methodologies eliminate traditional data-clearing bottlenecks, enabling seamless cross-institutional research and compliant decision-making in an evolving regulatory landscape.

Via

Access Paper or Ask Questions

TimeSqueeze: Dynamic Patching for Efficient Time Series Forecasting

Mar 11, 2026

Sravan Kumar Ankireddy, Nikita Seleznev, Nam H. Nguyen, Yulun Wu, Senthil Kumar, Furong Huang, C. Bayan Bruss

Abstract:Transformer-based time series foundation models face a fundamental trade-off in choice of tokenization: point-wise embeddings preserve temporal fidelity but scale poorly with sequence length, whereas fixed-length patching improves efficiency by imposing uniform boundaries that may disrupt natural transitions and blur informative local dynamics. In order to address these limitations, we introduce TimeSqueeze, a dynamic patching mechanism that adaptively selects patch boundaries within each sequence based on local signal complexity. TimeSqueeze first applies a lightweight state-space encoder to extract full-resolution point-wise features, then performs content-aware segmentation by allocating short patches to information-dense regions and long patches to smooth or redundant segments. This variable-resolution compression preserves critical temporal structure while substantially reducing the token sequence presented to the Transformer backbone. Specifically for large-scale pretraining, TimeSqueeze attains up to 20x faster convergence and 8x higher data efficiency compared to equivalent point-token baselines. Experiments across long-horizon forecasting benchmarks show that TimeSqueeze consistently outperforms comparable architectures that use either point-wise tokenization or fixed-size patching.

* 21 pages, 14 figures

Via

Access Paper or Ask Questions

Revisiting RAG Retrievers: An Information Theoretic Benchmark

Feb 25, 2026

Wenqing Zheng, Dmitri Kalaev, Noah Fatsi, Daniel Barcklow, Owen Reinert, Igor Melnyk, Senthil Kumar, C. Bayan Bruss

Abstract:Retrieval-Augmented Generation (RAG) systems rely critically on the retriever module to surface relevant context for large language models. Although numerous retrievers have recently been proposed, each built on different ranking principles such as lexical matching, dense embeddings, or graph citations, there remains a lack of systematic understanding of how these mechanisms differ and overlap. Existing benchmarks primarily compare entire RAG pipelines or introduce new datasets, providing little guidance on selecting or combining retrievers themselves. Those that do compare retrievers directly use a limited set of evaluation tools which fail to capture complementary and overlapping strengths. This work presents MIGRASCOPE, a Mutual Information based RAG Retriever Analysis Scope. We revisit state-of-the-art retrievers and introduce principled metrics grounded in information and statistical estimation theory to quantify retrieval quality, redundancy, synergy, and marginal contribution. We further show that if chosen carefully, an ensemble of retrievers outperforms any single retriever. We leverage the developed tools over major RAG corpora to provide unique insights on contribution levels of the state-of-the-art retrievers. Our findings provide a fresh perspective on the structure of modern retrieval techniques and actionable guidance for designing robust and efficient RAG systems.

Via

Access Paper or Ask Questions

A Simple Baseline for Predicting Events with Auto-Regressive Tabular Transformers

Oct 14, 2024

Alex Stein, Samuel Sharpe, Doron Bergman, Senthil Kumar, Bayan Bruss, John Dickerson, Tom Goldstein, Micah Goldblum

Abstract:Many real-world applications of tabular data involve using historic events to predict properties of new ones, for example whether a credit card transaction is fraudulent or what rating a customer will assign a product on a retail platform. Existing approaches to event prediction include costly, brittle, and application-dependent techniques such as time-aware positional embeddings, learned row and field encodings, and oversampling methods for addressing class imbalance. Moreover, these approaches often assume specific use-cases, for example that we know the labels of all historic events or that we only predict a pre-specified label and not the data's features themselves. In this work, we propose a simple but flexible baseline using standard autoregressive LLM-style transformers with elementary positional embeddings and a causal language modeling objective. Our baseline outperforms existing approaches across popular datasets and can be employed for various use-cases. We demonstrate that the same model can predict labels, impute missing values, or model event sequences.

* 10 pages, 6 pages of references+appendix

Via

Access Paper or Ask Questions

Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning

May 26, 2024

Neha Kalibhat, Priyatham Kattakinda, Arman Zarei, Nikita Seleznev, Samuel Sharpe, Senthil Kumar, Soheil Feizi

Figure 1 for Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning

Figure 2 for Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning

Figure 3 for Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning

Figure 4 for Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning

Abstract:Vision transformers have established a precedent of patchifying images into uniformly-sized chunks before processing. We hypothesize that this design choice may limit models in learning comprehensive and compositional representations from visual data. This paper explores the notion of providing semantically-meaningful visual tokens to transformer encoders within a vision-language pre-training framework. Leveraging off-the-shelf segmentation and scene-graph models, we extract representations of instance segmentation masks (referred to as tangible tokens) and relationships and actions (referred to as intangible tokens). Subsequently, we pre-train a vision-side transformer by incorporating these newly extracted tokens and aligning the resultant embeddings with caption embeddings from a text-side encoder. To capture the structural and semantic relationships among visual tokens, we introduce additive attention weights, which are used to compute self-attention scores. Our experiments on COCO demonstrate notable improvements over ViTs in learned representation quality across text-to-image (+47%) and image-to-text retrieval (+44%) tasks. Furthermore, we showcase the advantages on compositionality benchmarks such as ARO (+18%) and Winoground (+10%).

Via

Access Paper or Ask Questions

From Explanation to Action: An End-to-End Human-in-the-loop Framework for Anomaly Reasoning and Management

Apr 06, 2023

Xueying Ding, Nikita Seleznev, Senthil Kumar, C. Bayan Bruss, Leman Akoglu

Figure 1 for From Explanation to Action: An End-to-End Human-in-the-loop Framework for Anomaly Reasoning and Management

Figure 2 for From Explanation to Action: An End-to-End Human-in-the-loop Framework for Anomaly Reasoning and Management

Figure 3 for From Explanation to Action: An End-to-End Human-in-the-loop Framework for Anomaly Reasoning and Management

Figure 4 for From Explanation to Action: An End-to-End Human-in-the-loop Framework for Anomaly Reasoning and Management

Abstract:Anomalies are often indicators of malfunction or inefficiency in various systems such as manufacturing, healthcare, finance, surveillance, to name a few. While the literature is abundant in effective detection algorithms due to this practical relevance, autonomous anomaly detection is rarely used in real-world scenarios. Especially in high-stakes applications, a human-in-the-loop is often involved in processes beyond detection such as verification and troubleshooting. In this work, we introduce ALARM (for Analyst-in-the-Loop Anomaly Reasoning and Management); an end-to-end framework that supports the anomaly mining cycle comprehensively, from detection to action. Besides unsupervised detection of emerging anomalies, it offers anomaly explanations and an interactive GUI for human-in-the-loop processes -- visual exploration, sense-making, and ultimately action-taking via designing new detection rules -- that help close ``the loop'' as the new rules complement rule-based supervised detection, typical of many deployed systems in practice. We demonstrate \method's efficacy through a series of case studies with fraud analysts from the financial industry.

Via

Access Paper or Ask Questions