Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Laks V. S. Lakshmanan

The University of British Columbia

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

Jun 17, 2026

Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Issam H. Laradji

Abstract:Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows which is a sequence of action-steps. For example, rather than summarizing budgeting policies, an agent should be able to determine the steps needed to answer a question such as: "How do I request new headcount given a fixed budget?". Therefore, we introduce DRFLOW, a benchmark for evaluating personalized workflows predicted by agents from heterogeneous sources. Each task requires the agent to identify relevant evidence from scattered sources, then use that evidence to predict the correct action-step sequence for the user's task. DRFLOW contains 100 tasks across five domains, with 1,246 reference workflow steps grounded in more than 3,900 sources. We define seven diagnostic metrics covering factual grounding, step recovery, structural ordering, condition resolution, and personalization. We further present DRFLOW-Agent (DRFA), a workflow-oriented reference agent to predict personalized workflow. We show that although DRFA improves over strong baseline agents (upto 10.02% average F1 score), there is substantial room for improvement remains across these workflow metrics, indicating that predicting complete and correct personalized workflows remains a challenging frontier for deep research.

Via

Access Paper or Ask Questions

Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models

Dec 24, 2025

Xiang Zhang, Jiaqi Wei, Yuejin Yang, Zijie Qiu, Yuhan Chen, Zhiqiang Gao, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Wanli Ouyang, Chenyu You(+1 more)

Figure 1 for Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models

Figure 2 for Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models

Figure 3 for Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models

Figure 4 for Reflection Pretraining Enables Token-Level Self-Correction in Biological Sequence Models

Abstract:Chain-of-Thought (CoT) prompting has significantly advanced task-solving capabilities in natural language processing with large language models. Unlike standard prompting, CoT encourages the model to generate intermediate reasoning steps, non-answer tokens, that help guide the model toward more accurate final outputs. These intermediate steps enable more complex reasoning processes such as error correction, memory management, future planning, and self-reflection. However, applying CoT to non-natural language domains, such as protein and RNA language models, is not yet possible, primarily due to the limited expressiveness of their token spaces (e.g., amino acid tokens). In this work, we propose and define the concept of language expressiveness: the ability of a given language, using its tokens and grammar, to encode information. We show that the limited expressiveness of protein language severely restricts the applicability of CoT-style reasoning. To overcome this, we introduce reflection pretraining, for the first time in a biological sequence model, which enables the model to engage in intermediate reasoning through the generation of auxiliary "thinking tokens" beyond simple answer tokens. Theoretically, we demonstrate that our augmented token set significantly enhances biological language expressiveness, thereby improving the overall reasoning capacity of the model. Experimentally, our pretraining approach teaches protein models to self-correct and leads to substantial performance gains compared to standard pretraining.

Via

Access Paper or Ask Questions

NAMeGEn: Creative Name Generation via A Novel Agent-based Multiple Personalized Goal Enhancement Framework

Nov 19, 2025

Shanlin Zhou, Xinpeng Wang, Jianxun Lian, Zhenghao Liu, Laks V. S. Lakshmanan, Xiaoyuan Yi, Yongtao Hao

Abstract:Trained on diverse human-authored texts, Large Language Models (LLMs) unlocked the potential for Creative Natural Language Generation (CNLG), benefiting various applications like advertising and storytelling. Nevertheless, CNLG still remains difficult due to two main challenges. (1) Multi-objective flexibility: user requirements are often personalized, fine-grained, and pluralistic, which LLMs struggle to satisfy simultaneously; (2) Interpretive complexity: beyond generation, creativity also involves understanding and interpreting implicit meaning to enhance users' perception. These challenges significantly limit current methods, especially in short-form text generation, in generating creative and insightful content. To address this, we focus on Chinese baby naming, a representative short-form CNLG task requiring adherence to explicit user constraints (e.g., length, semantics, anthroponymy) while offering meaningful aesthetic explanations. We propose NAMeGEn, a novel multi-agent optimization framework that iteratively alternates between objective extraction, name generation, and evaluation to meet diverse requirements and generate accurate explanations. To support this task, we further construct a classical Chinese poetry corpus with 17k+ poems to enhance aesthetics, and introduce CBNames, a new benchmark with tailored metrics. Extensive experiments demonstrate that NAMeGEn effectively generates creative names that meet diverse, personalized requirements while providing meaningful explanations, outperforming six baseline methods spanning various LLM backbones without any training.

* 13 pages,9 figures. This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation

Sep 04, 2025

Farnoosh Hashemi, Laks V. S. Lakshmanan

Figure 1 for KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation

Figure 2 for KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation

Figure 3 for KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation

Figure 4 for KRAFT: A Knowledge Graph-Based Framework for Automated Map Conflation

Abstract:Digital maps play a crucial role in various applications such as navigation, fleet management, and ride-sharing, necessitating their accuracy and currency, which require timely updates. While the majority of geospatial databases (GDBs) provide high-quality information, their data is (i) limited to specific regions and/or (ii) missing some entities, even in their covered areas. Map conflation is the process of augmentation of a GDB using another GDB to conflate missing spatial features. Existing map conflation methods suffer from two main limitations: (1) They are designed for the conflation of linear objects (e.g., road networks) and cannot simply be extended to non-linear objects, thus missing information about most entities in the map. (2) They are heuristic algorithmic approaches that are based on pre-defined rules, unable to learn entities matching in a data-driven manner. To address these limitations, we design KRAFT, a learning based approach consisting of three parts: (1) Knowledge Graph Construction - where each GDB is represented by a knowledge graph, (2) Map Matching - where we use a knowledge graph alignment method as well as a geospatial feature encoder to match entities in obtained knowledge graphs, and (3) Map Merging - where we merge matched entities in the previous modules in a consistent manner, using a mixed integer linear programming formulation that fully merges the GDBs without adding any inconsistencies. Our experimental evaluation shows that not only does KRAFT achieve outstanding performance compared to state-of-the-art and baseline methods in map conflation tasks, but each of its modules (e.g., Map Matching and Map Merging) also separately outperforms traditional matching and merging methods.

Via

Access Paper or Ask Questions

BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs

Apr 15, 2025

Mohammad Matin Najafi, Xianju Zhu, Chrysanthi Kosyfaki, Laks V. S. Lakshmanan, Reynold Cheng

Figure 1 for BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs

Figure 2 for BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs

Figure 3 for BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs

Figure 4 for BEACON: A Benchmark for Efficient and Accurate Counting of Subgraphs

Abstract:Subgraph counting the task of determining the number of instances of a query pattern within a large graph lies at the heart of many critical applications, from analyzing financial networks and transportation systems to understanding biological interactions. Despite decades of work yielding efficient algorithmic (AL) solutions and, more recently, machine learning (ML) approaches, a clear comparative understanding is elusive. This gap stems from the absence of a unified evaluation framework, standardized datasets, and accessible ground truths, all of which hinder systematic analysis and fair benchmarking. To overcome these barriers, we introduce BEACON: a comprehensive benchmark designed to rigorously evaluate both AL and ML-based subgraph counting methods. BEACON provides a standardized dataset with verified ground truths, an integrated evaluation environment, and a public leaderboard, enabling reproducible and transparent comparisons across diverse approaches. Our extensive experiments reveal that while AL methods excel in efficiently counting subgraphs on very large graphs, they struggle with complex patterns (e.g., those exceeding six nodes). In contrast, ML methods are capable of handling larger patterns but demand massive graph data inputs and often yield suboptimal accuracy on small, dense graphs. These insights not only highlight the unique strengths and limitations of each approach but also pave the way for future advancements in subgraph counting techniques. Overall, BEACON represents a significant step towards unifying and accelerating research in subgraph counting, encouraging innovative solutions and fostering a deeper understanding of the trade-offs between algorithmic and machine learning paradigms.

Via

Access Paper or Ask Questions

TRACE: Intra-visit Clinical Event Nowcasting via Effective Patient Trajectory Encoding

Mar 29, 2025

Yuyang Liang, Yankai Chen, Yixiang Fang, Laks V. S. Lakshmanan, Chenhao Ma

Abstract:Electronic Health Records (EHR) have become a valuable resource for a wide range of predictive tasks in healthcare. However, existing approaches have largely focused on inter-visit event predictions, overlooking the importance of intra-visit nowcasting, which provides prompt clinical insights during an ongoing patient visit. To address this gap, we introduce the task of laboratory measurement prediction within a hospital visit. We study the laboratory data that, however, remained underexplored in previous work. We propose TRACE, a Transformer-based model designed for clinical event nowcasting by encoding patient trajectories. TRACE effectively handles long sequences and captures temporal dependencies through a novel timestamp embedding that integrates decay properties and periodic patterns of data. Additionally, we introduce a smoothed mask for denoising, improving the robustness of the model. Experiments on two large-scale electronic health record datasets demonstrate that the proposed model significantly outperforms previous methods, highlighting its potential for improving patient care through more accurate laboratory measurement nowcasting. The code is available at https://github.com/Amehi/TRACE.

* Accepted by WWW'25 short paper track

Via

Access Paper or Ask Questions

On Aggregation Queries over Predicted Nearest Neighbors

Feb 26, 2025

Carrie Wang, Sihem Amer-Yahia, Laks V. S. Lakshmanan, Reynold Cheng

Abstract:We introduce Aggregation Queries over Nearest Neighbors (AQNNs), a novel type of aggregation queries over the predicted neighborhood of a designated object. AQNNs are prevalent in modern applications where, for instance, a medical professional may want to compute "the average systolic blood pressure of patients whose predicted condition is similar to a given insomnia patient". Since prediction typically involves an expensive deep learning model or a human expert, we formulate query processing as the problem of returning an approximate aggregate by combining an expensive oracle and a cheaper model (e.g, a simple ML model) to compute the predictions. We design the Sampler with Precision-Recall in Target (SPRinT) framework for answering AQNNs. SPRinT consists of sampling, nearest neighbor refinement, and aggregation, and is tailored for various aggregation functions. It enjoys provable theoretical guarantees, including bounds on sample size and on error in approximate aggregates. Our extensive experiments on medical, e-commerce, and video datasets demonstrate that SPRinT consistently achieves the lowest aggregation error with minimal computation cost compared to its baselines. Scalability results show that SPRinT's execution time and aggregation error remain stable as the dataset size increases, confirming its suitability for large-scale applications.

* 14 pages, 11 figures, 9 tables

Via

Access Paper or Ask Questions

Cross-Modal Consistency in Multimodal Large Language Models

Nov 14, 2024

Xiang Zhang, Senyu Li, Ning Shi, Bradley Hauer, Zijun Wu, Grzegorz Kondrak, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

Figure 1 for Cross-Modal Consistency in Multimodal Large Language Models

Figure 2 for Cross-Modal Consistency in Multimodal Large Language Models

Figure 3 for Cross-Modal Consistency in Multimodal Large Language Models

Figure 4 for Cross-Modal Consistency in Multimodal Large Language Models

Abstract:Recent developments in multimodal methodologies have marked the beginning of an exciting era for models adept at processing diverse data types, encompassing text, audio, and visual content. Models like GPT-4V, which merge computer vision with advanced language processing, exhibit extraordinary proficiency in handling intricate tasks that require a simultaneous understanding of both textual and visual information. Prior research efforts have meticulously evaluated the efficacy of these Vision Large Language Models (VLLMs) in various domains, including object detection, image captioning, and other related fields. However, existing analyses have often suffered from limitations, primarily centering on the isolated evaluation of each modality's performance while neglecting to explore their intricate cross-modal interactions. Specifically, the question of whether these models achieve the same level of accuracy when confronted with identical task instances across different modalities remains unanswered. In this study, we take the initiative to delve into the interaction and comparison among these modalities of interest by introducing a novel concept termed cross-modal consistency. Furthermore, we propose a quantitative evaluation framework founded on this concept. Our experimental findings, drawn from a curated collection of parallel vision-language datasets developed by us, unveil a pronounced inconsistency between the vision and language modalities within GPT-4V, despite its portrayal as a unified multimodal model. Our research yields insights into the appropriate utilization of such models and hints at potential avenues for enhancing their design.

Via

Access Paper or Ask Questions

Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer

Sep 20, 2024

Xiang Zhang, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

Figure 1 for Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer

Figure 2 for Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer

Figure 3 for Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer

Figure 4 for Autoregressive + Chain of Thought = Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer

Abstract:The Transformer architecture excels in a variety of language modeling tasks, outperforming traditional neural architectures such as RNN and LSTM. This is partially due to its elimination of recurrent connections, which allows for parallel training and a smoother flow of gradients. However, this move away from recurrent structures places the Transformer model at the lower end of Chomsky's computational hierarchy, imposing limitations on its computational abilities. Consequently, even advanced Transformer-based models face considerable difficulties in tasks like counting, string reversal, and multiplication. These tasks, though seemingly elementary, require a level of computational complexity that exceeds the capabilities of the Transformer architecture. Concurrently, the emergence of ``Chain of Thought" (CoT) prompting has enabled Transformer-based language models to tackle tasks that were previously impossible or poorly executed. In this work, we thoroughly investigate the influence of recurrent structures in neural models on their reasoning abilities and computability, contrasting the role autoregression plays in the neural models' computational power. We then shed light on how the CoT approach can mimic recurrent computation and act as a bridge between autoregression and recurrence in the context of language models. It is this approximated recurrence that notably improves the model's performance and computational capacity. Moreover, we revisit recent recurrent-based Transformer model designs, focusing on their computational abilities through our proposed concept of ``recurrence-completeness" and identify key theoretical limitations in models like Linear Transformer and RWKV. Through this, we aim to provide insight into the neural model architectures and prompt better model design.

Via

Access Paper or Ask Questions

Autoregressive + Chain of Thought (CoT) $\simeq$ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer

Sep 14, 2024

Xiang Zhang, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan

$Figure 1 for Autoregressive + Chain of Thought (CoT) $\simeq$ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer$

$Figure 2 for Autoregressive + Chain of Thought (CoT) $\simeq$ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer$

$Figure 3 for Autoregressive + Chain of Thought (CoT) $\simeq$ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer$

$Figure 4 for Autoregressive + Chain of Thought (CoT) $\simeq$ Recurrent: Recurrence's Role in Language Models and a Revist of Recurrent Transformer$

Abstract:The Transformer architecture excels in a variety of language modeling tasks, outperforming traditional neural architectures such as RNN and LSTM. This is partially due to its elimination of recurrent connections, which allows for parallel training and a smoother flow of gradients. However, this move away from recurrent structures places the Transformer model at the lower end of Chomsky's computational hierarchy, imposing limitations on its computational abilities. Consequently, even advanced Transformer-based models face considerable difficulties in tasks like counting, string reversal, bracket pairing, and multiplication. These tasks, though seemingly elementary, require a level of computational complexity that exceeds the capabilities of the Transformer architecture. Concurrently, the emergence of ``Chain of Thought" (CoT) prompting has enabled Transformer-based language models to tackle tasks that were previously impossible or poorly executed. Despite some previous research primarily interpreting CoT from a psychological perspective, a comprehensive understanding of \textit{why} CoT proves so effective in the reasoning process remains elusive. In this work, we thoroughly investigate the influence of recurrent structures in language models on their reasoning abilities, shedding light on how the CoT approach can mimic recurrent computation and act as a bridge between autoregression and recurrence. It is this approximated recurrence that notably improves the model's performance and computational capacity. Moreover, we revisit recent recurrent-based Transformer model designs, focusing on their computational abilities through our proposed concept of ``recurrence-completeness" and identify key theoretical limitations in models like Linear Transformer and RWKV. Through this, we aim to provide insight into the neural model architectures and prompt better model design.

Via

Access Paper or Ask Questions