This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying different instructions to the same corpus, demonstrates a high degree of interpretability.
Background: Colonoscopy, a crucial diagnostic tool in gastroenterology, depends heavily on superior bowel preparation. ChatGPT, a large language model with emergent intelligence which also exhibits potential in medical applications. This study aims to assess the accuracy and consistency of ChatGPT in using the Boston Bowel Preparation Scale (BBPS) for colonoscopy assessment. Methods: We retrospectively collected 233 colonoscopy images from 2020 to 2023. These images were evaluated using the BBPS by 3 senior endoscopists and 3 novice endoscopists. Additionally, ChatGPT also assessed these images, having been divided into three groups and undergone specific Fine-tuning. Consistency was evaluated through two rounds of testing. Results: In the initial round, ChatGPT's accuracy varied between 48.93% and 62.66%, trailing the endoscopists' accuracy of 76.68% to 77.83%. Kappa values for ChatGPT was between 0.52 and 0.53, compared to 0.75 to 0.87 for the endoscopists. Conclusion: While ChatGPT shows promise in bowel preparation scoring, it currently does not match the accuracy and consistency of experienced endoscopists. Future research should focus on in-depth Fine-tuning.
Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning requires the extraction of underlying semantics from both free-form questions and semi-structured tabular data. Chain-of-Thought and its similar approaches incorporate the reasoning chain in the form of textual context, but it is still an open question how to effectively leverage tabular data in the reasoning chain. We propose the Chain-of-Table framework, where tabular data is explicitly used in the reasoning chain as a proxy for intermediate thoughts. Specifically, we guide LLMs using in-context learning to iteratively generate operations and update the table to represent a tabular reasoning chain. LLMs can therefore dynamically plan the next operation based on the results of the previous ones. This continuous evolution of the table forms a chain, showing the reasoning process for a given tabular problem. The chain carries structured information of the intermediate results, enabling more accurate and reliable predictions. Chain-of-Table achieves new state-of-the-art performance on WikiTQ, FeTaQA, and TabFact benchmarks across multiple LLM choices.
Detecting anomalies in fundus images through unsupervised methods is a challenging task due to the similarity between normal and abnormal tissues, as well as their indistinct boundaries. The current methods have limitations in accurately detecting subtle anomalies while avoiding false positives. To address these challenges, we propose the ReSynthDetect network which utilizes a reconstruction network for modeling normal images, and an anomaly generator that produces synthetic anomalies consistent with the appearance of fundus images. By combining the features of consistent anomaly generation and image reconstruction, our method is suited for detecting fundus abnormalities. The proposed approach has been extensively tested on benchmark datasets such as EyeQ and IDRiD, demonstrating state-of-the-art performance in both image-level and pixel-level anomaly detection. Our experiments indicate a substantial 9% improvement in AUROC on EyeQ and a significant 17.1% improvement in AUPR on IDRiD.
The COVID-19 pandemic has exerted a profound impact on the global economy and continues to exact a significant toll on human lives. The COVID-19 case growth rate stands as a key epidemiological parameter to estimate and monitor for effective detection and containment of the resurgence of outbreaks. A fundamental challenge in growth rate estimation and hence outbreak detection is balancing the accuracy-speed tradeoff, where accuracy typically degrades with shorter fitting windows. In this paper, we develop a machine learning (ML) algorithm, which we call Transfer Learning Generalized Random Forest (TLGRF), that balances this accuracy-speed tradeoff. Specifically, we estimate the instantaneous COVID-19 exponential growth rate for each U.S. county by using TLGRF that chooses an adaptive fitting window size based on relevant day-level and county-level features affecting the disease spread. Through transfer learning, TLGRF can accurately estimate case growth rates for counties with small sample sizes. Out-of-sample prediction analysis shows that TLGRF outperforms established growth rate estimation methods. Furthermore, we conducted a case study based on outbreak case data from the state of Colorado and showed that the timely detection of outbreaks could have been improved by up to 224% using TLGRF when compared to the decisions made by Colorado's Department of Health and Environment (CDPHE). To facilitate implementation, we have developed a publicly available outbreak detection tool for timely detection of COVID-19 outbreaks in each U.S. county, which received substantial attention from policymakers.
* Equal contributions by co-first authors Zhaowei She, Zilong Wang (in
In the realm of e-commerce search, the significance of semantic matching cannot be overstated, as it directly impacts both user experience and company revenue. Along this line, query rewriting, serving as an important technique to bridge the semantic gaps inherent in the semantic matching process, has attached wide attention from the industry and academia. However, existing query rewriting methods often struggle to effectively optimize long-tail queries and alleviate the phenomenon of "few-recall" caused by semantic gap. In this paper, we present BEQUE, a comprehensive framework that Bridges the sEmantic gap for long-tail QUEries. In detail, BEQUE comprises three stages: multi-instruction supervised fine tuning (SFT), offline feedback, and objective alignment. We first construct a rewriting dataset based on rejection sampling and auxiliary tasks mixing to fine-tune our large language model (LLM) in a supervised fashion. Subsequently, with the well-trained LLM, we employ beam search to generate multiple candidate rewrites, and feed them into Taobao offline system to obtain the partial order. Leveraging the partial order of rewrites, we introduce a contrastive learning method to highlight the distinctions between rewrites, and align the model with the Taobao online objectives. Offline experiments prove the effectiveness of our method in bridging semantic gap. Online A/B tests reveal that our method can significantly boost gross merchandise volume (GMV), number of transaction (#Trans) and unique visitor (UV) for long-tail queries. BEQUE has been deployed on Taobao, one of most popular online shopping platforms in China, since October 2023.
With the rapid development of the internet, online social media welcomes people with different backgrounds through its diverse content. The increasing usage of emoji becomes a noticeable trend thanks to emoji's rich information beyond cultural or linguistic borders. However, the current study on emojis is limited to single emoji prediction and there are limited data resources available for further study of the interesting linguistic phenomenon. To this end, we synthesize a large text-emoji parallel corpus, Text2Emoji, from a large language model. Based on the parallel corpus, we distill a sequence-to-sequence model, EmojiLM, which is specialized in the text-emoji bidirectional translation. Extensive experiments on public benchmarks and human evaluation demonstrate that our proposed model outperforms strong baselines and the parallel corpus benefits emoji-related downstream tasks.
Federated learning (FL) is becoming a major driving force behind machine learning as a service, where customers (clients) collaboratively benefit from shared local updates under the orchestration of the service provider (server). Representing clients' current demands and the server's future demand, local model personalization and global model generalization are separately investigated, as the ill-effects of data heterogeneity enforce the community to focus on one over the other. However, these two seemingly competing goals are of equal importance rather than black and white issues, and should be achieved simultaneously. In this paper, we propose the first algorithm to balance personalization and generalization on top of game theory, dubbed PAGE, which reshapes FL as a co-opetition game between clients and the server. To explore the equilibrium, PAGE further formulates the game as Markov decision processes, and leverages the reinforcement learning algorithm, which simplifies the solving complexity. Extensive experiments on four widespread datasets show that PAGE outperforms state-of-the-art FL baselines in terms of global and local prediction accuracy simultaneously, and the accuracy can be improved by up to 35.20% and 39.91%, respectively. In addition, biased variants of PAGE imply promising adaptiveness to demand shifts in practice.