Abstract:Search in e-Commerce is powered at the core by a structured representation of the inventory, often formulated as a category taxonomy. An important capability in e-Commerce with hierarchical taxonomies is to select a set of relevant leaf categories that are semantically aligned with a given user query. In this scope, we address a fundamental problem of search query categorization in real-world e-Commerce taxonomies. A correct categorization of a query not only provides a way to zoom into the correct inventory space, but opens the door to multiple intent understanding capabilities for a query. A practical and accurate solution to this problem has many applications in e-commerce, including constraining retrieved items and improving the relevance of the search results. For this task, we explore a novel Chain-of-Thought (CoT) paradigm that combines simple tree-search with LLM semantic scoring. Assessing its classification performance on human-judged query-category pairs, relevance tests, and LLM-based reference methods, we find that the CoT approach performs better than a benchmark that uses embedding-based query category predictions. We show how the CoT approach can detect problems within a hierarchical taxonomy. Finally, we also propose LLM-based approaches for query-categorization of the same spirit, but which scale better at the range of millions of queries.
Abstract:Understanding and modeling buyer intent is a foundational challenge in optimizing search query reformulation within the dynamic landscape of e-commerce search systems. This work introduces a robust data pipeline designed to mine and analyze large-scale buyer query logs, with a focus on extracting fine-grained intent signals from both explicit interactions and implicit behavioral cues. Leveraging advanced sequence mining techniques and supervised learning models, the pipeline systematically captures patterns indicative of latent purchase intent, enabling the construction of a high-fidelity, intent-rich dataset. The proposed framework facilitates the development of adaptive query rewrite strategies by grounding reformulations in inferred user intent rather than surface-level lexical signals. This alignment between query rewriting and underlying user objectives enhances both retrieval relevance and downstream engagement metrics. Empirical evaluations across multiple product verticals demonstrate measurable gains in precision-oriented relevance metrics, underscoring the efficacy of intent-aware reformulation. Our findings highlight the value of intent-centric modeling in bridging the gap between sparse user inputs and complex product discovery goals, and establish a scalable foundation for future research in user-aligned neural retrieval and ranking systems.