Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alex Sheng

Retrieval, Refinement, and Ranking for Text-to-Video Generation via Prompt Optimization and Test-Time Scaling

Mar 02, 2026

Zillur Rahman, Alex Sheng, Cristian Meo

Abstract:While large-scale datasets have driven significant progress in Text-to-Video (T2V) generative models, these models remain highly sensitive to input prompts, demonstrating that prompt design is critical to generation quality. Current methods for improving video output often fall short: they either depend on complex, post-editing models, risking the introduction of artifacts, or require expensive fine-tuning of the core generator, which severely limits both scalability and accessibility. In this work, we introduce 3R, a novel RAG based prompt optimization framework. 3R utilizes the power of current state-of-the-art T2V diffusion model and vision language model. It can be used with any T2V model without any kind of model training. The framework leverages three key strategies: RAG-based modifiers extraction for enriched contextual grounding, diffusion-based Preference Optimization for aligning outputs with human preferences, and temporal frame interpolation for producing temporally consistent visual contents. Together, these components enable more accurate, efficient, and contextually aligned text-to-video generation. Experimental results demonstrate the efficacy of 3R in enhancing the static fidelity and dynamic coherence of generated videos, underscoring the importance of optimizing user prompts.

* 2026 ICLR TTU Workshop

Via

Access Paper or Ask Questions

From Language Models to Practical Self-Improving Computer Agents

Apr 18, 2024

Alex Sheng

Abstract:We develop a simple and straightforward methodology to create AI computer agents that can carry out diverse computer tasks and self-improve by developing tools and augmentations to enable themselves to solve increasingly complex tasks. As large language models (LLMs) have been shown to benefit from non-parametric augmentations, a significant body of recent work has focused on developing software that augments LLMs with various capabilities. Rather than manually developing static software to augment LLMs through human engineering effort, we propose that an LLM agent can systematically generate software to augment itself. We show, through a few case studies, that a minimal querying loop with appropriate prompt engineering allows an LLM to generate and use various augmentations, freely extending its own capabilities to carry out real-world computer tasks. Starting with only terminal access, we prompt an LLM agent to augment itself with retrieval, internet search, web navigation, and text editor capabilities. The agent effectively uses these various tools to solve problems including automated software development and web-based tasks.

Via

Access Paper or Ask Questions

Task Transfer and Domain Adaptation for Zero-Shot Question Answering

Jun 14, 2022

Xiang Pan, Alex Sheng, David Shimshoni, Aditya Singhal, Sara Rosenthal, Avirup Sil

Figure 1 for Task Transfer and Domain Adaptation for Zero-Shot Question Answering

Figure 2 for Task Transfer and Domain Adaptation for Zero-Shot Question Answering

Figure 3 for Task Transfer and Domain Adaptation for Zero-Shot Question Answering

Figure 4 for Task Transfer and Domain Adaptation for Zero-Shot Question Answering

Abstract:Pretrained language models have shown success in various areas of natural language processing, including reading comprehension tasks. However, when applying machine learning methods to new domains, labeled data may not always be available. To address this, we use supervised pretraining on source-domain data to reduce sample complexity on domain-specific downstream tasks. We evaluate zero-shot performance on domain-specific reading comprehension tasks by combining task transfer with domain adaptation to fine-tune a pretrained model with no labelled data from the target task. Our approach outperforms Domain-Adaptive Pretraining on downstream domain-specific reading comprehension tasks in 3 out of 4 domains.

* NAACL 2022 Deep Learning for Low-Resource NLP Workshop Paper

Via

Access Paper or Ask Questions

An Initial Look at Self-Reprogramming Artificial Intelligence

Apr 30, 2022

Alex Sheng

Figure 1 for An Initial Look at Self-Reprogramming Artificial Intelligence

Figure 2 for An Initial Look at Self-Reprogramming Artificial Intelligence

Abstract:Rapid progress in deep learning research has greatly extended the capabilities of artificial intelligence technology. Conventional AI models are constrained to explicit human-designed algorithms, although a growing body of work in meta-learning, neural architecture search, and related approaches have explored algorithms that self-modify to some extent. In this paper, we develop and experimentally validate the first fully self-reprogramming AI system. Applying AI-based computer code generation to AI itself, we implement an algorithm with the ability to continuously modify and rewrite its own neural network source code.

Via

Access Paper or Ask Questions

Distributed Evolution Strategies Using TPUs for Meta-Learning

Jan 01, 2022

Alex Sheng, Derek He

Figure 1 for Distributed Evolution Strategies Using TPUs for Meta-Learning

Figure 2 for Distributed Evolution Strategies Using TPUs for Meta-Learning

Figure 3 for Distributed Evolution Strategies Using TPUs for Meta-Learning

Figure 4 for Distributed Evolution Strategies Using TPUs for Meta-Learning

Abstract:Meta-learning traditionally relies on backpropagation through entire tasks to iteratively improve a model's learning dynamics. However, this approach is computationally intractable when scaled to complex tasks. We propose a distributed evolutionary meta-learning strategy using Tensor Processing Units (TPUs) that is highly parallel and scalable to arbitrarily long tasks with no increase in memory cost. Using a Prototypical Network trained with evolution strategies on the Omniglot dataset, we achieved an accuracy of 98.4% on a 5-shot classification problem. Our algorithm used as much as 40 times less memory than automatic differentiation to compute the gradient, with the resulting model achieving accuracy within 1.3% of a backpropagation-trained equivalent (99.6%). We observed better classification accuracy as high as 99.1% with larger population configurations. We further experimentally validate the stability and performance of ES-ProtoNet across a variety of training conditions (varying population size, model size, number of workers, shot, way, ES hyperparameters, etc.). Our contributions are twofold: we provide the first assessment of evolutionary meta-learning in a supervised setting, and create a general framework for distributed evolution strategies on TPUs.

* 2020 IEEE Symposium Series on Computational Intelligence (SSCI), 2020, pp. 721-728
* Published in Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI)

Via

Access Paper or Ask Questions