Knowledge-based dialogue systems with internet retrieval have recently attracted considerable attention from researchers. The dialogue systems overcome a major limitation of traditional knowledge dialogue systems, where the timeliness of knowledge cannot be assured, hence providing greater practical application value. Knowledge-based dialogue systems with internet retrieval can be typically segmented into three tasks: Retrieval Decision, Query Generation, and Response Generation. However, many of studies assumed that all conversations require external knowledge to continue, neglecting the critical step of determining when retrieval is necessary. This assumption often leads to an over-dependence on external knowledge, even when it may not be required. Our work addresses this oversight by employing a single unified model facilitated by prompt and multi-task learning approaches. This model not only decides whether retrieval is necessary but also generates retrieval queries and responses. By integrating these functions, our system leverages the full potential of pre-trained models and reduces the complexity and costs associated with deploying multiple models. We conducted extensive experiments to investigate the mutual enhancement among the three tasks in our system. What is more, the experiment results on the Wizint and Dusinc datasets not only demonstrate that our unified model surpasses the baseline performance for individual tasks, but also reveal that it achieves comparable results when contrasted with SOTA systems that deploy separate, specialized models for each task.
Studies on semi-supervised medical image segmentation (SSMIS) have seen fast progress recently. Due to the limited labelled data, SSMIS methods mainly focus on effectively leveraging unlabeled data to enhance the segmentation performance. However, despite their promising performance, current state-of-the-art methods often prioritize integrating complex techniques and loss terms rather than addressing the core challenges of semi-supervised scenarios directly. We argue that the key to SSMIS lies in generating substantial and appropriate prediction disagreement on unlabeled data. To this end, we emphasize the crutiality of data perturbation and model stabilization in semi-supervised segmentation, and propose a simple yet effective approach to boost SSMIS performance significantly, dubbed DPMS. Specifically, we first revisit SSMIS from three distinct perspectives: the data, the model, and the loss, and conduct a comprehensive study of corresponding strategies to examine their effectiveness. Based on these examinations, we then propose DPMS, which adopts a plain teacher-student framework with a standard supervised loss and unsupervised consistency loss. To produce appropriate prediction disagreements, DPMS perturbs the unlabeled data via strong augmentations to enlarge prediction disagreements considerably. On the other hand, using EMA teacher when strong augmentation is applied does not necessarily improve performance. DPMS further utilizes a forwarding-twice and momentum updating strategies for normalization statistics to stabilize the training on unlabeled data effectively. Despite its simplicity, DPMS can obtain new state-of-the-art performance on the public 2D ACDC and 3D LA datasets across various semi-supervised settings, e.g. obtaining a remarkable 22.62% improvement against previous SOTA on ACDC with 5% labels.
Background: To develop an artificial intelligence system that can accurately identify acute non-traumatic intracranial hemorrhage (ICH) etiology based on non-contrast CT (NCCT) scans and investigate whether clinicians can benefit from it in a diagnostic setting. Materials and Methods: The deep learning model was developed with 1868 eligible NCCT scans with non-traumatic ICH collected between January 2011 and April 2018. We tested the model on two independent datasets (TT200 and SD 98) collected after April 2018. The model's diagnostic performance was compared with clinicians's performance. We further designed a simulated study to compare the clinicians's performance with and without the deep learning system augmentation. Results: The proposed deep learning system achieved area under the receiver operating curve of 0.986 (95% CI 0.967-1.000) on aneurysms, 0.952 (0.917-0.987) on hypertensive hemorrhage, 0.950 (0.860-1.000) on arteriovenous malformation (AVM), 0.749 (0.586-0.912) on Moyamoya disease (MMD), 0.837 (0.704-0.969) on cavernous malformation (CM), and 0.839 (0.722-0.959) on other causes in TT200 dataset. Given a 90% specificity level, the sensitivities of our model were 97.1% and 90.9% for aneurysm and AVM diagnosis, respectively. The model also shows an impressive generalizability in an independent dataset SD98. The clinicians achieve significant improvements in the sensitivity, specificity, and accuracy of diagnoses of certain hemorrhage etiologies with proposed system augmentation. Conclusions: The proposed deep learning algorithms can be an effective tool for early identification of hemorrhage etiologies based on NCCT scans. It may also provide more information for clinicians for triage and further imaging examination selection.
Objective: Reliable tools to predict moyamoya disease (MMD) patients at risk for hemorrhage could have significant value. The aim of this paper is to develop three machine learning classification algorithms to predict hemorrhage in moyamoya disease. Methods: Clinical data of consecutive MMD patients who were admitted to our hospital between 2009 and 2015 were reviewed. Demographics, clinical, radiographic data were analyzed to develop artificial neural network (ANN), support vector machine (SVM), and random forest models. Results: We extracted 33 parameters, including 11 demographic and 22 radiographic features as input for model development. Of all compared classification results, ANN achieved the highest overall accuracy of 75.7% (95% CI, 68.6%-82.8%), followed by SVM with 69.2% (95% CI, 56.9%-81.5%) and random forest with 70.0% (95% CI, 57.0%-83.0%). Conclusions: The proposed ANN framework can be a potential effective tool to predict the possibility of hemorrhage among adult MMD patients based on clinical information and radiographic features.
In many industrial applications like online advertising and recommendation systems, diverse and accurate user profiles can greatly help improve personalization. For building user profiles, deep learning is widely used to mine expressive tags to describe users' preferences from their historical actions. For example, tags mined from users' click-action history can represent the categories of ads that users are interested in, and they are likely to continue being clicked in the future. Traditional solutions usually introduce multiple independent Two-Tower models to mine tags from different actions, e.g., click, conversion. However, the models cannot learn complementarily and support effective training for data-sparse actions. Besides, limited by the lack of information fusion between the two towers, the model learning is insufficient to represent users' preferences on various topics well. This paper introduces a novel multi-task model called Mixture of Virtual-Kernel Experts (MVKE) to learn multiple topic-related user preferences based on different actions unitedly. In MVKE, we propose a concept of Virtual-Kernel Expert, which focuses on modeling one particular facet of the user's preference, and all of them learn coordinately. Besides, the gate-based structure used in MVKE builds an information fusion bridge between two towers, improving the model's capability much and maintaining high efficiency. We apply the model in Tencent Advertising System, where both online and offline evaluations show that our method has a significant improvement compared with the existing ones and brings about an obvious lift to actual advertising revenue.
With the transition from people's traditional `brick-and-mortar' shopping to online mobile shopping patterns in web 2.0 $\mathit{era}$, the recommender system plays a critical role in E-Commerce and E-Retails. This is especially true when designing this system for more than $\mathbf{236~million}$ daily active users. Ranking strategy, the key module of the recommender system, needs to be precise, accurate, and responsive for estimating customers' intents. We propose a dynamic ranking paradigm, named as DNN-MAB, that is composed of a pairwise deep neural network (DNN) $\mathit{pre}$-ranker connecting a revised multi-armed bandit (MAB) dynamic $\mathit{post}$-ranker. By taking into account of explicit and implicit user feedbacks such as impressions, clicks, conversions, etc. DNN-MAB is able to adjust DNN $\mathit{pre}$-ranking scores to assist customers locating items they are interested in most so that they can converge quickly and frequently. To the best of our knowledge, frameworks like DNN-MAB have not been discussed in the previous literature to either E-Commerce or machine learning audiences. In practice, DNN-MAB has been deployed to production and it easily outperforms against other state-of-the-art models by significantly lifting the gross merchandise volume (GMV) which is the objective metrics at JD.
Document classification for text, images and other applicable entities has long been a focus of research in academia and also finds application in many industrial settings. Amidst a plethora of approaches to solve such problems, machine-learning techniques have found success in a variety of scenarios. In this paper we discuss the design of a machine learning-based semi-supervised job title classification system for the online job recruitment domain currently in production at CareerBuilder.com and propose enhancements to it. The system leverages a varied collection of classification as well clustering algorithms. These algorithms are encompassed in an architecture that facilitates leveraging existing off-the-shelf machine learning tools and techniques while keeping into consideration the challenges of constructing a scalable classification system for a large taxonomy of categories. As a continuously evolving system that is still under development we first discuss the existing semi-supervised classification system which is composed of both clustering and classification components in a proximity-based classifier setup and results of which are already used across numerous products at CareerBuilder. We then elucidate our long-term goals for job title classification and propose enhancements to the existing system in the form of a two-stage coarse and fine level classifier augmentation to construct a cascade of hierarchical vertical classifiers. Preliminary results are presented using experimental evaluation on real world industrial data.