Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Liu

Peter

Radiology-GPT: A Large Language Model for Radiology

Jun 14, 2023

Zhengliang Liu, Aoxiao Zhong, Yiwei Li, Longtao Yang, Chao Ju, Zihao Wu, Chong Ma, Peng Shu, Cheng Chen, Sekeun Kim(+9 more)

Figure 1 for Radiology-GPT: A Large Language Model for Radiology

Figure 2 for Radiology-GPT: A Large Language Model for Radiology

Figure 3 for Radiology-GPT: A Large Language Model for Radiology

Figure 4 for Radiology-GPT: A Large Language Model for Radiology

Abstract:We introduce Radiology-GPT, a large language model for radiology. Using an instruction tuning approach on an extensive dataset of radiology domain knowledge, Radiology-GPT demonstrates superior performance compared to general language models such as StableLM, Dolly and LLaMA. It exhibits significant versatility in radiological diagnosis, research, and communication. This work serves as a catalyst for future developments in clinical NLP. The successful implementation of Radiology-GPT is indicative of the potential of localizing generative large language models, specifically tailored for distinctive medical specialties, while ensuring adherence to privacy standards such as HIPAA. The prospect of developing individualized, large-scale language models that cater to specific needs of various hospitals presents a promising direction. The fusion of conversational competence and domain-specific knowledge in these models is set to foster future development in healthcare AI. A demo of Radiology-GPT is available at https://huggingface.co/spaces/allen-eric/radiology-gpt.

Via

Access Paper or Ask Questions

Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Jun 13, 2023

Lan Wang, Ruiling He, Lili Zhao, Jia Wang, Zhengzi Geng, Tao Ren, Guo Zhang, Peng Zhang, Kaiqiang Tang, Chaofei Gao(+37 more)

Figure 1 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Figure 2 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Figure 3 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Figure 4 for Deep learning radiomics for assessment of gastroesophageal varices in people with compensated advanced chronic liver disease

Abstract:Objective: Bleeding from gastroesophageal varices (GEV) is a medical emergency associated with high mortality. We aim to construct an artificial intelligence-based model of two-dimensional shear wave elastography (2D-SWE) of the liver and spleen to precisely assess the risk of GEV and high-risk gastroesophageal varices (HRV). Design: A prospective multicenter study was conducted in patients with compensated advanced chronic liver disease. 305 patients were enrolled from 12 hospitals, and finally 265 patients were included, with 1136 liver stiffness measurement (LSM) images and 1042 spleen stiffness measurement (SSM) images generated by 2D-SWE. We leveraged deep learning methods to uncover associations between image features and patient risk, and thus conducted models to predict GEV and HRV. Results: A multi-modality Deep Learning Risk Prediction model (DLRP) was constructed to assess GEV and HRV, based on LSM and SSM images, and clinical information. Validation analysis revealed that the AUCs of DLRP were 0.91 for GEV (95% CI 0.90 to 0.93, p < 0.05) and 0.88 for HRV (95% CI 0.86 to 0.89, p < 0.01), which were significantly and robustly better than canonical risk indicators, including the value of LSM and SSM. Moreover, DLPR was better than the model using individual parameters, including LSM and SSM images. In HRV prediction, the 2D-SWE images of SSM outperform LSM (p < 0.01). Conclusion: DLRP shows excellent performance in predicting GEV and HRV over canonical risk indicators LSM and SSM. Additionally, the 2D-SWE images of SSM provided more information for better accuracy in predicting HRV than the LSM.

Via

Access Paper or Ask Questions

Global and Local Semantic Completion Learning for Vision-Language Pre-training

Jun 12, 2023

Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

Figure 1 for Global and Local Semantic Completion Learning for Vision-Language Pre-training

Figure 2 for Global and Local Semantic Completion Learning for Vision-Language Pre-training

Figure 3 for Global and Local Semantic Completion Learning for Vision-Language Pre-training

Figure 4 for Global and Local Semantic Completion Learning for Vision-Language Pre-training

Abstract:Cross-modal alignment plays a crucial role in vision-language pre-training (VLP) models, enabling them to capture meaningful associations across different modalities. For this purpose, inspired by the success of masked language modeling (MLM) tasks in the NLP pre-training area, numerous masked modeling tasks have been proposed for VLP to further promote cross-modal interactions. The core idea of previous masked modeling tasks is to focus on reconstructing the masked tokens based on visible context for learning local-local alignment, i.e., associations between image patches and text tokens. However, most of them pay little attention to the global semantic features generated for the masked data, resulting in a limited cross-modal alignment ability of global representations to local features of the other modality. Therefore, in this paper, we propose a novel Global and Local Semantic Completion Learning (GLSCL) task to facilitate global-local alignment and local-local alignment simultaneously. Specifically, the GLSCL task complements the missing semantics of masked data and recovers global and local features by cross-modal interactions. Our GLSCL consists of masked global semantic completion (MGSC) and masked local token completion (MLTC). MGSC promotes learning more representative global features which have a great impact on the performance of downstream tasks, and MLTC can further enhance accurate comprehension on multimodal data. Moreover, we present a flexible vision encoder, enabling our model to simultaneously perform image-text and video-text multimodal tasks. Experimental results show that our proposed method obtains state-of-the-art performance on various vision-language benchmarks, such as visual question answering, image-text retrieval, and video-text retrieval.

* arXiv admin note: substantial text overlap with arXiv:2211.13437

Via

Access Paper or Ask Questions

Modeling Structural Similarities between Documents for Coherence Assessment with Graph Convolutional Networks

Jun 10, 2023

Wei Liu, Xiyan Fu, Michael Strube

Abstract:Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document's coherence patterns, ignoring the underlying correlation between documents. We investigate a GCN-based coherence model that is capable of capturing structural similarities between documents. Our model first creates a graph structure for each document, from where we mine different subgraph patterns. We then construct a heterogeneous graph for the training corpus, connecting documents based on their shared subgraphs. Finally, a GCN is applied to the heterogeneous graph to model the connectivity relationships. We evaluate our method on two tasks, assessing discourse coherence and automated essay scoring. Results show that our GCN-based model outperforms all baselines, achieving a new state-of-the-art on both tasks.

Via

Access Paper or Ask Questions

Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation

Jun 10, 2023

Wei Liu, Michael Strube

Figure 1 for Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation

Figure 2 for Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation

Figure 3 for Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation

Figure 4 for Annotation-Inspired Implicit Discourse Relation Classification with Auxiliary Discourse Connective Generation

Abstract:Implicit discourse relation classification is a challenging task due to the absence of discourse connectives. To overcome this issue, we design an end-to-end neural model to explicitly generate discourse connectives for the task, inspired by the annotation process of PDTB. Specifically, our model jointly learns to generate discourse connectives between arguments and predict discourse relations based on the arguments and the generated connectives. To prevent our relation classifier from being misled by poor connectives generated at the early stage of training while alleviating the discrepancy between training and inference, we adopt Scheduled Sampling to the joint learning. We evaluate our method on three benchmarks, PDTB 2.0, PDTB 3.0, and PCC. Results show that our joint model significantly outperforms various baselines on three datasets, demonstrating its superiority for the task.

Via

Access Paper or Ask Questions

Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing

May 29, 2023

Aiwei Liu, Wei Liu, Xuming Hu, Shuang Li, Fukun Ma, Yawen Yang, Lijie Wen

Figure 1 for Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing

Figure 2 for Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing

Figure 3 for Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing

Figure 4 for Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing

Abstract:In the context-dependent Text-to-SQL task, the generated SQL statements are refined iteratively based on the user input utterance from each interaction. The input text from each interaction can be viewed as component modifications to the previous SQL statements, which could be further extracted as the modification patterns. Since these modification patterns could also be combined with other SQL statements, the models are supposed to have the compositional generalization to these novel combinations. This work is the first exploration of compositional generalization in context-dependent Text-to-SQL scenarios. To facilitate related studies, we constructed two challenging benchmarks named \textsc{CoSQL-CG} and \textsc{SParC-CG} by recombining the modification patterns and existing SQL statements. The following experiments show that all current models struggle on our proposed benchmarks. Furthermore, we found that better aligning the previous SQL statements with the input utterance could give models better compositional generalization ability. Based on these observations, we propose a method named \texttt{p-align} to improve the compositional generalization of Text-to-SQL models. Further experiments validate the effectiveness of our method. Source code and data are available.

* Accepted to ACL 2023 (Findings), Long Paper, 11 pages. arXiv admin note: substantial text overlap with arXiv:2205.07686, arXiv:2210.11888 by other authors

Via

Access Paper or Ask Questions

GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

May 29, 2023

Tao Wang, Kaihao Zhang, Ziqian Shao, Wenhan Luo, Bjorn Stenger, Tong Lu, Tae-Kyun Kim, Wei Liu, Hongdong Li

Figure 1 for GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Figure 2 for GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Figure 3 for GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Figure 4 for GridFormer: Residual Dense Transformer with Grid Structure for Image Restoration in Adverse Weather Conditions

Abstract:Image restoration in adverse weather conditions is a difficult task in computer vision. In this paper, we propose a novel transformer-based framework called GridFormer which serves as a backbone for image restoration under adverse weather conditions. GridFormer is designed in a grid structure using a residual dense transformer block, and it introduces two core designs. First, it uses an enhanced attention mechanism in the transformer layer. The mechanism includes stages of the sampler and compact self-attention to improve efficiency, and a local enhancement stage to strengthen local information. Second, we introduce a residual dense transformer block (RDTB) as the final GridFormer layer. This design further improves the network's ability to learn effective features from both preceding and current local features. The GridFormer framework achieves state-of-the-art results on five diverse image restoration tasks in adverse weather conditions, including image deraining, dehazing, deraining & dehazing, desnowing, and multi-weather restoration. The source code and pre-trained models will be released.

* 17 pages, 12 figures

Via

Access Paper or Ask Questions

Decoupled Rationalization with Asymmetric Learning Rates: A Flexible Lipschitz Restraint

May 26, 2023

Wei Liu, Jun Wang, Haozhao Wang, Ruixuan Li, Yang Qiu, YuanKai Zhang, Jie Han, Yixiong Zou

Abstract:A self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on the selected rationales. However, such a cooperative game may incur the degeneration problem where the predictor overfits to the uninformative pieces generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces. In this paper, we theoretically bridge degeneration with the predictor's Lipschitz continuity. Then, we empirically propose a simple but effective method named DR, which can naturally and flexibly restrain the Lipschitz constant of the predictor, to address the problem of degeneration. The main idea of DR is to decouple the generator and predictor to allocate them with asymmetric learning rates. A series of experiments conducted on two widely used benchmarks have verified the effectiveness of the proposed method. Codes: \href{https://github.com/jugechengzi/Rationalization-DR}{https://github.com/jugechengzi/Rationalization-DR}.

* KDD 2023 research track

Via

Access Paper or Ask Questions

Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

May 26, 2023

Haodong He, Hao Fu, Qiang Wang, Shuai Zhou, Wei Liu

Figure 1 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Figure 2 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Figure 3 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Figure 4 for Spatio-Temporal Transformer-Based Reinforcement Learning for Robot Crowd Navigation

Abstract:The social robot navigation is an open and challenging problem. In existing work, separate modules are used to capture spatial and temporal features, respectively. However, such methods lead to extra difficulties in improving the utilization of spatio-temporal features and reducing the conservative nature of navigation policy. In light of this, we present a spatio-temporal transformer-based policy optimization algorithm to enhance the utilization of spatio-temporal features, thereby facilitating the capture of human-robot interactions. Specifically, this paper introduces a gated embedding mechanism that effectively aligns the spatial and temporal representations by integrating both modalities at the feature level. Then Transformer is leveraged to encode the spatio-temporal semantic information, with hope of finding the optimal navigation policy. Finally, a combination of spatio-temporal Transformer and self-adjusting policy entropy significantly reduces the conservatism of navigation policies. Experimental results demonstrate the effectiveness of the proposed framework, where our method shows superior performance.

Via

Access Paper or Ask Questions

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

May 25, 2023

Xu Bao, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Wangmeng Xiang, Jingdong Sun, Hanbing Liu, Wei Liu, Bin Luo, Yifeng Geng(+1 more)

Figure 1 for KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Figure 2 for KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Figure 3 for KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Figure 4 for KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration

Abstract:In the realm of facial analysis, accurate landmark detection is crucial for various applications, ranging from face recognition and expression analysis to animation. Conventional heatmap or coordinate regression-based techniques, however, often face challenges in terms of computational burden and quantization errors. To address these issues, we present the KeyPoint Positioning System (KeyPosS), a groundbreaking facial landmark detection framework that stands out from existing methods. For the first time, KeyPosS employs the True-range Multilateration algorithm, a technique originally used in GPS systems, to achieve rapid and precise facial landmark detection without relying on computationally intensive regression approaches. The framework utilizes a fully convolutional network to predict a distance map, which computes the distance between a Point of Interest (POI) and multiple anchor points. These anchor points are ingeniously harnessed to triangulate the POI's position through the True-range Multilateration algorithm. Notably, the plug-and-play nature of KeyPosS enables seamless integration into any decoding stage, ensuring a versatile and adaptable solution. We conducted a thorough evaluation of KeyPosS's performance by benchmarking it against state-of-the-art models on four different datasets. The results show that KeyPosS substantially outperforms leading methods in low-resolution settings while requiring a minimal time overhead. The code is available at https://github.com/zhiqic/KeyPosS.

Via

Access Paper or Ask Questions