Abstract:Despite the widespread exploration of Retrieval-Augmented Generation (RAG), its deployment in enterprises for domain-specific datasets remains limited due to poor answer accuracy. These corpora, often shielded behind firewalls in private enterprise knowledge bases, having complex, domain-specific terminology, rarely seen by LLMs during pre-training; exhibit significant semantic variability across domains (like networking, military, or legal, etc.), or even within a single domain like medicine, and thus result in poor context precision for RAG systems. Currently, in such situations, fine-tuning or RAG with fine-tuning is attempted, but these approaches are slow, expensive, and lack generalization for accuracy as the new domain-specific data emerges. We propose an approach for Enterprise Search that focuses on enhancing the retriever for a domain-specific corpus through hybrid query indexes and metadata enrichment. This 'MetaGen Blended RAG' method constructs a metadata generation pipeline using key concepts, topics, and acronyms, and then creates a metadata-enriched hybrid index with boosted search queries. This approach avoids overfitting and generalizes effectively across domains. On the PubMedQA benchmark for the biomedical domain, the proposed method achieves 82% retrieval accuracy and 77% RAG accuracy, surpassing all previous RAG accuracy results without fine-tuning and sets a new benchmark for zero-shot results while outperforming much larger models like GPT3.5. The results are even comparable to the best fine-tuned models on this dataset, and we further demonstrate the robustness and scalability of the approach by evaluating it on other Q&A datasets like SQuAD, NQ etc.
Abstract:For a successful business, engaging in an effective campaign is a key task for marketers. Most previous studies used various mathematical models to segment customers without considering the correlation between customer segmentation and a campaign. This work presents a conceptual model by studying the significant campaign-dependent variables of customer targeting in customer segmentation context. In this way, the processes of customer segmentation and targeting thus can be linked and solved together. The outcomes of customer segmentation of this study could be more meaningful and relevant for marketers. This investigation applies a customer life time value (LTV) model to assess the fitness between targeted customer groups and marketing strategies. To integrate customer segmentation and customer targeting, this work uses the genetic algorithm (GA) to determine the optimized marketing strategy. Later, we suggest using C&RT (Classification and Regression Tree) in SPSS PASW Modeler as the replacement to Genetic Algorithm technique to accomplish these results. We also suggest using LOSSYCOUNTING and Counting Bloom Filter to dynamically design the right and up-to-date offer to the right customer.
Abstract:ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.