Alert button
Picture for Xudong Lu

Xudong Lu

Alert button

School of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, P.R. China, School of Industrial Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands

ImageBind-LLM: Multi-modality Instruction Tuning

Sep 11, 2023
Jiaming Han, Renrui Zhang, Wenqi Shao, Peng Gao, Peng Xu, Han Xiao, Kaipeng Zhang, Chris Liu, Song Wen, Ziyu Guo, Xudong Lu, Shuai Ren, Yafei Wen, Xiaoxin Chen, Xiangyu Yue, Hongsheng Li, Yu Qiao

Figure 1 for ImageBind-LLM: Multi-modality Instruction Tuning
Figure 2 for ImageBind-LLM: Multi-modality Instruction Tuning
Figure 3 for ImageBind-LLM: Multi-modality Instruction Tuning
Figure 4 for ImageBind-LLM: Multi-modality Instruction Tuning

We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.

* Code is available at https://github.com/OpenGVLab/LLaMA-Adapter 
Viaarxiv icon

Zero-shot information extraction from radiological reports using ChatGPT

Sep 07, 2023
Danqing Hu, Bing Liu, Xiaofeng Zhu, Xudong Lu, Nan Wu

Figure 1 for Zero-shot information extraction from radiological reports using ChatGPT
Figure 2 for Zero-shot information extraction from radiological reports using ChatGPT
Figure 3 for Zero-shot information extraction from radiological reports using ChatGPT
Figure 4 for Zero-shot information extraction from radiological reports using ChatGPT

Electronic health records contain an enormous amount of valuable information, but many are recorded in free text. Information extraction is the strategy to transform the sequence of characters into structured data, which can be employed for secondary analysis. However, the traditional information extraction components, such as named entity recognition and relation extraction, require annotated data to optimize the model parameters, which has become one of the major bottlenecks in building information extraction systems. With the large language models achieving good performances on various downstream NLP tasks without parameter tuning, it becomes possible to use large language models for zero-shot information extraction. In this study, we aim to explore whether the most popular large language model, ChatGPT, can extract useful information from the radiological reports. We first design the prompt template for the interested information in the CT reports. Then, we generate the prompts by combining the prompt template with the CT reports as the inputs of ChatGPT to obtain the responses. A post-processing module is developed to transform the responses into structured extraction results. We conducted the experiments with 847 CT reports collected from Peking University Cancer Hospital. The experimental results indicate that ChatGPT can achieve competitive performances for some extraction tasks compared with the baseline information extraction system, but some limitations need to be further improved.

Viaarxiv icon

From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Apr 04, 2023
Yong-Lu Li, Xiaoqian Wu, Xinpeng Liu, Yiming Dou, Yikun Ji, Junyi Zhang, Yixing Li, Jingru Tan, Xudong Lu, Cewu Lu

Figure 1 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Figure 2 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Figure 3 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Figure 4 for From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding

Action understanding matters and attracts attention. It can be formed as the mapping from the action physical space to the semantic space. Typically, researchers built action datasets according to idiosyncratic choices to define classes and push the envelope of benchmarks respectively. Thus, datasets are incompatible with each other like "Isolated Islands" due to semantic gaps and various class granularities, e.g., do housework in dataset A and wash plate in dataset B. We argue that a more principled semantic space is an urgent need to concentrate the community efforts and enable us to use all datasets together to pursue generalizable action learning. To this end, we design a Poincare action semantic space given verb taxonomy hierarchy and covering massive actions. By aligning the classes of previous datasets to our semantic space, we gather (image/video/skeleton/MoCap) datasets into a unified database in a unified label system, i.e., bridging "isolated islands" into a "Pangea". Accordingly, we propose a bidirectional mapping model between physical and semantic space to fully use Pangea. In extensive experiments, our system shows significant superiority, especially in transfer learning. Code and data will be made publicly available.

* Project Webpage: https://mvig-rhos.com/pangea 
Viaarxiv icon

Towards Multi-perspective conformance checking with fuzzy sets

Jan 29, 2020
Sicui Zhang, Laura Genga, Hui Yan, Xudong Lu, Huilong Duan, Uzay Kaymak

Figure 1 for Towards Multi-perspective conformance checking with fuzzy sets
Figure 2 for Towards Multi-perspective conformance checking with fuzzy sets
Figure 3 for Towards Multi-perspective conformance checking with fuzzy sets
Figure 4 for Towards Multi-perspective conformance checking with fuzzy sets

Conformance checking techniques are widely adopted to pinpoint possible discrepancies between process models and the execution of the process in reality. However, state of the art approaches adopt a crisp evaluation of deviations, with the result that small violations are considered at the same level of significant ones. This affects the quality of the provided diagnostics, especially when there exists some tolerance with respect to reasonably small violations, and hampers the flexibility of the process. In this work, we propose a novel approach which allows to represent actors' tolerance with respect to violations and to account for severity of deviations when assessing executions compliance. We argue that besides improving the quality of the provided diagnostics, allowing some tolerance in deviations assessment also enhances the flexibility of conformance checking techniques and, indirectly, paves the way for improving the resilience of the overall process management system.

* 15 pages, 5 figures 
Viaarxiv icon