Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Harsh Patel

Everyday Robots

A Verifiable Search Is Not a Learnable Chain-of-Thought

Jun 20, 2026

Harsh Patel

Abstract:It is tempting to assume any task solvable by a short program can be taught to a model as its chain-of-thought: write the steps out, fine-tune, and the model follows. This paper shows the assumption fails for an identifiable class of procedures. The testbed is nine reasoning tasks, each from a deterministic generator; public and hidden splits share generators, so held-out data proxies test accuracy. I reverse-engineer the generators into Python solvers, render them as chain-of-thought, and distill into a rank-<= 32 LoRA over a 30B (3.5B-active) Nemotron model. Forward-computable tasks install readily: lookup/arithmetic and an 8-bit boolean task transfer (>= 0.99 and 0.68). Cryptarithm does not: distilling its backtracking search holds at 0.01-0.07 across eleven chain-of-thought designs, RL from verifiable rewards, and self-training, even though a search solver answers 71% of instances. This is not a capability gap. The model does the arithmetic on 97-100% of lines and ranks the correct cipher in its top eight on 71%; it cannot carry the search forward as a left-to-right derivation. Fine-tuning learns the shape of a verifiable elimination step while its verdicts become unconditional templates, correct only 16-57% of the time ("verdict-as-token"). The ceiling holds across backbones from 3B to 671B and across fine-tuning and prompting; a controlled intervention isolates the cause: revealing the cipher key, which turns the derivation forward, lifts the same instances from 0.03 to 0.57. When a procedure's only solution is search over information-free structure, no faithful forward chain-of-thought exists to imitate. The task becomes learnable only by removing the search, precomputing its combinatorial core into a catalog and reducing the trace to recall plus verification; the 1st-place solution reaches Private LB 0.92 this way. What distills is memorization and verification, not search.

* 31 pages, 6 figures, 16 tables; Interactive walkthrough: https://nemotron.harshpatel.live ; Code, solvers, and per-row eval data: https://github.com/harshpatel1692/search-not-learnable

Via

Access Paper or Ask Questions

AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning

Dec 19, 2025

Ran Gong, Xiaohan Zhang, Jinghuan Shang, Maria Vittoria Minniti, Jigarkumar Patel, Valerio Pepe, Riedana Yan, Ahmet Gundogdu, Ivan Kapelyukh, Ali Abbas(+4 more)

Abstract:Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promising way for scaling up data collection, the related tasks, including simulation task design, task-aware scene generation, expert demonstration synthesis, and sim-to-real transfer, still demand substantial human effort. We present AnyTask, an automated framework that pairs massively parallel GPU simulation with foundation models to design diverse manipulation tasks and synthesize robot data. We introduce three AnyTask agents for generating expert demonstrations aiming to solve as many tasks as possible: 1) ViPR, a novel task and motion planning agent with VLM-in-the-loop Parallel Refinement; 2) ViPR-Eureka, a reinforcement learning agent with generated dense rewards and LLM-guided contact sampling; 3) ViPR-RL, a hybrid planning and learning approach that jointly produces high-quality demonstrations with only sparse rewards. We train behavior cloning policies on generated data, validate them in simulation, and deploy them directly on real robot hardware. The policies generalize to novel object poses, achieving 44% average success across a suite of real-world pick-and-place, drawer opening, contact-rich pushing, and long-horizon manipulation tasks. Our project website is at https://anytask.rai-inst.com .

* 28 pages, 25 figures. The first four authors contributed equally

Via

Access Paper or Ask Questions

Sceniris: A Fast Procedural Scene Generation Framework

Dec 18, 2025

Jinghuan Shang, Harsh Patel, Ran Gong, Karl Schmeckpeper

Abstract:Synthetic 3D scenes are essential for developing Physical AI and generative models. Existing procedural generation methods often have low output throughput, creating a significant bottleneck in scaling up dataset creation. In this work, we introduce Sceniris, a highly efficient procedural scene generation framework for rapidly generating large-scale, collision-free scene variations. Sceniris also provides an optional robot reachability check, providing manipulation-feasible scenes for robot tasks. Sceniris is designed for maximum efficiency by addressing the primary performance limitations of the prior method, Scene Synthesizer. Leveraging batch sampling and faster collision checking in cuRobo, Sceniris achieves at least 234x speed-up over Scene Synthesizer. Sceniris also expands the object-wise spatial relationships available in prior work to support diverse scene requirements. Our code is available at https://github.com/rai-inst/sceniris

* Code is available at https://github.com/rai-inst/sceniris

Via

Access Paper or Ask Questions

Feature Selection via GANs (GANFS): Enhancing Machine Learning Models for DDoS Mitigation

Apr 21, 2025

Harsh Patel

Figure 1 for Feature Selection via GANs (GANFS): Enhancing Machine Learning Models for DDoS Mitigation

Figure 2 for Feature Selection via GANs (GANFS): Enhancing Machine Learning Models for DDoS Mitigation

Figure 3 for Feature Selection via GANs (GANFS): Enhancing Machine Learning Models for DDoS Mitigation

Figure 4 for Feature Selection via GANs (GANFS): Enhancing Machine Learning Models for DDoS Mitigation

Abstract:Distributed Denial of Service (DDoS) attacks represent a persistent and evolving threat to modern networked systems, capable of causing large-scale service disruptions. The complexity of such attacks, often hidden within high-dimensional and redundant network traffic data, necessitates robust and intelligent feature selection techniques for effective detection. Traditional methods such as filter-based, wrapper-based, and embedded approaches, each offer strengths but struggle with scalability or adaptability in complex attack environments. In this study, we explore these existing techniques through a detailed comparative analysis and highlight their limitations when applied to large-scale DDoS detection tasks. Building upon these insights, we introduce a novel Generative Adversarial Network-based Feature Selection (GANFS) method that leverages adversarial learning dynamics to identify the most informative features. By training a GAN exclusively on attack traffic and employing a perturbation-based sensitivity analysis on the Discriminator, GANFS effectively ranks feature importance without relying on full supervision. Experimental evaluations using the CIC-DDoS2019 dataset demonstrate that GANFS not only improves the accuracy of downstream classifiers but also enhances computational efficiency by significantly reducing feature dimensionality. These results point to the potential of integrating generative learning models into cybersecurity pipelines to build more adaptive and scalable detection systems.

Via

Access Paper or Ask Questions

VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

May 25, 2024

Michael Ahn, Montserrat Gonzalez Arenas, Matthew Bennice, Noah Brown, Christine Chan, Byron David, Anthony Francis, Gavin Gonzalez, Rainer Hessmer, Tomas Jackson(+15 more)

Figure 1 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Figure 2 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Figure 3 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Figure 4 for VADER: Visual Affordance Detection and Error Recovery for Multi Robot Human Collaboration

Abstract:Robots today can exploit the rich world knowledge of large language models to chain simple behavioral skills into long-horizon tasks. However, robots often get interrupted during long-horizon tasks due to primitive skill failures and dynamic environments. We propose VADER, a plan, execute, detect framework with seeking help as a new skill that enables robots to recover and complete long-horizon tasks with the help of humans or other robots. VADER leverages visual question answering (VQA) modules to detect visual affordances and recognize execution errors. It then generates prompts for a language model planner (LMP) which decides when to seek help from another robot or human to recover from errors in long-horizon task execution. We show the effectiveness of VADER with two long-horizon robotic tasks. Our pilot study showed that VADER is capable of performing complex long-horizon tasks by asking for help from another robot to clear a table. Our user study showed that VADER is capable of performing complex long-horizon tasks by asking for help from a human to clear a path. We gathered feedback from people (N=19) about the performance of the VADER performance vs. a robot that did not ask for help. https://google-vader.github.io/

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

May 10, 2024

Harsh Patel, Buvaneswari A. Ramanan, Manzoor A. Khan, Thomas Williams, Brian Friedman, Lawrence Drabeck

Figure 1 for Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

Figure 2 for Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

Figure 3 for Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

Figure 4 for Automating Code Adaptation for MLOps -- A Benchmarking Study on LLMs

Abstract:This paper explores the possibilities of the current generation of Large Language Models for incorporating Machine Learning Operations (MLOps) functionalities into ML training code bases. We evaluate the performance of OpenAI (gpt-3.5-turbo) and WizardCoder (open-source, 15B parameters) models on the automated accomplishment of various MLOps functionalities in different settings. We perform a benchmarking study that assesses the ability of these models to: (1) adapt existing code samples (Inlining) with component-specific MLOps functionality such as MLflow and Weights & Biases for experiment tracking, Optuna for hyperparameter optimization etc., and (2) perform the task of Translation from one component of an MLOps functionality to another, e.g., translating existing GitPython library based version control code to Data Version Control library based. We also propose three different approaches that involve teaching LLMs to comprehend the API documentation of the components as a reference while accomplishing the Translation tasks. In our evaluations, the gpt-3.5-turbo model significantly outperforms WizardCoder by achieving impressive Pass@3 accuracy in model optimization (55% compared to 0% by WizardCoder), experiment tracking (100%, compared to 62.5% by WizardCoder), model registration (92% compared to 42% by WizardCoder) and hyperparameter optimization (83% compared to 58% by WizardCoder) on average, in their best possible settings, showcasing its superior code adaptability performance in complex MLOps tasks.

* The work was completed during 2Q, 3Q of Year 2023, when WizardCoder was the top performing Open source LLM for coding. Newer and better models have emerged since then. The processes and methodologies utilized for this benchmarking can still be utilized for evaluating the current SoTA models

Via

Access Paper or Ask Questions

A State-of-the-practice Release-readiness Checklist for Generative AI-based Software Products

Mar 27, 2024

Harsh Patel, Dominique Boucher, Emad Fallahzadeh, Ahmed E. Hassan, Bram Adams

Abstract:This paper investigates the complexities of integrating Large Language Models (LLMs) into software products, with a focus on the challenges encountered for determining their readiness for release. Our systematic review of grey literature identifies common challenges in deploying LLMs, ranging from pre-training and fine-tuning to user experience considerations. The study introduces a comprehensive checklist designed to guide practitioners in evaluating key release readiness aspects such as performance, monitoring, and deployment strategies, aiming to enhance the reliability and effectiveness of LLM-based applications in real-world settings.

Via

Access Paper or Ask Questions

Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks

Oct 13, 2023

Harsh Patel, Yuan Zhou, Alexander P Lamb, Shu Wang, Jieliang Luo

Figure 1 for Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks

Figure 2 for Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks

Figure 3 for Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks

Figure 4 for Hybrid Reinforcement Learning for Optimizing Pump Sustainability in Real-World Water Distribution Networks

Abstract:This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs). Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs. Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees. Conversely, reinforcement learning (RL) stands out for its adaptability to uncertainties and reduced inference time, enabling real-time responsiveness. However, the effective implementation of RL is contingent on building accurate simulation models for WDNs, and prior applications have been limited by errors in simulation training data. These errors can potentially cause the RL agent to learn misleading patterns and actions and recommend suboptimal operational strategies. To overcome these challenges, we present an improved "hybrid RL" methodology. This method integrates the benefits of RL while anchoring it in historical data, which serves as a baseline to incrementally introduce optimal control recommendations. By leveraging operational data as a foundation for the agent's actions, we enhance the explainability of the agent's actions, foster more robust recommendations, and minimize error. Our findings demonstrate that the hybrid RL agent can significantly improve sustainability, operational efficiency, and dynamically adapt to emerging scenarios in real-world WDNs.

Via

Access Paper or Ask Questions

Exploring Multilingual Text Data Distillation

Aug 09, 2023

Shivam Sahni, Harsh Patel

Figure 1 for Exploring Multilingual Text Data Distillation

Figure 2 for Exploring Multilingual Text Data Distillation

Figure 3 for Exploring Multilingual Text Data Distillation

Figure 4 for Exploring Multilingual Text Data Distillation

Abstract:With the rise of deep learning, large datasets and complex models have become common, requiring significant computing power. To address this, data distillation has emerged as a technique to quickly train models with lower memory and time requirements. However, data distillation on text-based datasets hasn't been explored much because of the challenges rising due to its discrete nature. Additionally, existing dataset distillation methods often struggle to generalize to new architectures. In the paper, we propose several data distillation techniques for multilingual text classification datasets using language-model-based learning methods. We conduct experiments to analyze their performance in terms of classification strength, and cross-architecture generalization. Furthermore, we investigate the language-specific fairness of the data summaries generated by these methods. Our approach builds upon existing techniques, enhancing cross-architecture generalization in the text data distillation domain.

Via

Access Paper or Ask Questions

Using dynamic circles and squares to visualize spatio-temporal variation

Nov 11, 2022

Harsh Patel, Nicole Schneider, Hanan Samet

Figure 1 for Using dynamic circles and squares to visualize spatio-temporal variation

Figure 2 for Using dynamic circles and squares to visualize spatio-temporal variation

Figure 3 for Using dynamic circles and squares to visualize spatio-temporal variation

Figure 4 for Using dynamic circles and squares to visualize spatio-temporal variation

Abstract:Visualizations such as bar charts, scatter plots, and objects on geographical maps often convey critical information, including exact and relative numeric values, using shapes. The choice of shape and method of encoding information is often arbitrarily, or based on convention. However, past studies have shown that the human eye can be fooled by visual representations. The Ebbinghaus illusion demonstrates that the perceived relative sizes of shapes depends on their configuration, which in turn can affect judgements, especially in visualizations like proportional symbol maps. In this study we evaluate the effects of varying the type of shapes and metrics for encoding data in visual representations on a spatio-temporal map interface. We find that some combinations of shape and metric are more conducive to accurate human judgements than others, and provide recommendations for applying these findings in future visualization designs.

Via

Access Paper or Ask Questions