Alert button
Picture for Kin-Man Lam

Kin-Man Lam

Alert button

FactLLaMA: Optimizing Instruction-Following Language Models with External Knowledge for Automated Fact-Checking

Sep 01, 2023
Tsun-Hin Cheung, Kin-Man Lam

Automatic fact-checking plays a crucial role in combating the spread of misinformation. Large Language Models (LLMs) and Instruction-Following variants, such as InstructGPT and Alpaca, have shown remarkable performance in various natural language processing tasks. However, their knowledge may not always be up-to-date or sufficient, potentially leading to inaccuracies in fact-checking. To address this limitation, we propose combining the power of instruction-following language models with external evidence retrieval to enhance fact-checking performance. Our approach involves leveraging search engines to retrieve relevant evidence for a given input claim. This external evidence serves as valuable supplementary information to augment the knowledge of the pretrained language model. Then, we instruct-tune an open-sourced language model, called LLaMA, using this evidence, enabling it to predict the veracity of the input claim more accurately. To evaluate our method, we conducted experiments on two widely used fact-checking datasets: RAWFC and LIAR. The results demonstrate that our approach achieves state-of-the-art performance in fact-checking tasks. By integrating external evidence, we bridge the gap between the model's knowledge and the most up-to-date and sufficient context available, leading to improved fact-checking outcomes. Our findings have implications for combating misinformation and promoting the dissemination of accurate information on online platforms. Our released materials are accessible at: https://thcheung.github.io/factllama.

* Accepted in APSIPA ASC 2023 
Viaarxiv icon

AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection

Aug 23, 2023
Jingchun Zhou, Zongxin He, Kin-Man Lam, Yudong Wang, Weishi Zhang, ChunLe Guo, Chongyi Li

Figure 1 for AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection
Figure 2 for AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection
Figure 3 for AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection
Figure 4 for AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection

In this paper, we present a novel Amplitude-Modulated Stochastic Perturbation and Vortex Convolutional Network, AMSP-UOD, designed for underwater object detection. AMSP-UOD specifically addresses the impact of non-ideal imaging factors on detection accuracy in complex underwater environments. To mitigate the influence of noise on object detection performance, we propose AMSP Vortex Convolution (AMSP-VConv) to disrupt the noise distribution, enhance feature extraction capabilities, effectively reduce parameters, and improve network robustness. We design the Feature Association Decoupling Cross Stage Partial (FAD-CSP) module, which strengthens the association of long and short-range features, improving the network performance in complex underwater environments. Additionally, our sophisticated post-processing method, based on non-maximum suppression with aspect-ratio similarity thresholds, optimizes detection in dense scenes, such as waterweed and schools of fish, improving object detection accuracy. Extensive experiments on the URPC and RUOD datasets demonstrate that our method outperforms existing state-of-the-art methods in terms of accuracy and noise immunity. AMSP-UOD proposes an innovative solution with the potential for real-world applications. Code will be made publicly available.

Viaarxiv icon

AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation

Mar 13, 2023
Hongxiang Huang, Daihui Yang, Gang Dai, Zhen Han, Yuyi Wang, Kin-Man Lam, Fan Yang, Shuangping Huang, Yongge Liu, Mengchao He

Figure 1 for AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation
Figure 2 for AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation
Figure 3 for AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation
Figure 4 for AGTGAN: Unpaired Image Translation for Photographic Ancient Character Generation

The study of ancient writings has great value for archaeology and philology. Essential forms of material are photographic characters, but manual photographic character recognition is extremely time-consuming and expertise-dependent. Automatic classification is therefore greatly desired. However, the current performance is limited due to the lack of annotated data. Data generation is an inexpensive but useful solution for data scarcity. Nevertheless, the diverse glyph shapes and complex background textures of photographic ancient characters make the generation task difficult, leading to the unsatisfactory results of existing methods. In this paper, we propose an unsupervised generative adversarial network called AGTGAN. By the explicit global and local glyph shape style modeling followed by the stroke-aware texture transfer, as well as an associate adversarial learning mechanism, our method can generate characters with diverse glyphs and realistic textures. We evaluate our approach on the photographic ancient character datasets, e.g., OBC306 and CSDD. Our method outperforms the state-of-the-art approaches in various metrics and performs much better in terms of the diversity and authenticity of generated samples. With our generated images, experiments on the largest photographic oracle bone character dataset show that our method can achieve a significant increase in classification accuracy, up to 16.34%.

Viaarxiv icon

Deep Learning Methods for Calibrated Photometric Stereo and Beyond: A Survey

Dec 16, 2022
Yakun Ju, Kin-Man Lam, Wuyuan Xie, Huiyu Zhou, Junyu Dong, Boxin Shi

Figure 1 for Deep Learning Methods for Calibrated Photometric Stereo and Beyond: A Survey
Figure 2 for Deep Learning Methods for Calibrated Photometric Stereo and Beyond: A Survey
Figure 3 for Deep Learning Methods for Calibrated Photometric Stereo and Beyond: A Survey
Figure 4 for Deep Learning Methods for Calibrated Photometric Stereo and Beyond: A Survey

Photometric stereo recovers the surface normals of an object from multiple images with varying shading cues, i.e., modeling the relationship between surface orientation and intensity at each pixel. Photometric stereo prevails in superior per-pixel resolution and fine reconstruction details. However, it is a complicated problem because of the non-linear relationship caused by non-Lambertian surface reflectance. Recently, various deep learning methods have shown a powerful ability in the context of photometric stereo against non-Lambertian surfaces. This paper provides a comprehensive review of existing deep learning-based calibrated photometric stereo methods. We first analyze these methods from different perspectives, including input processing, supervision, and network architecture. We summarize the performance of deep learning photometric stereo models on the most widely-used benchmark data set. This demonstrates the advanced performance of deep learning-based photometric stereo methods. Finally, we give suggestions and propose future research trends based on the limitations of existing models.

* 16 pages, 10 figures, 4 tables 
Viaarxiv icon

Online Video Super-Resolution with Convolutional Kernel Bypass Graft

Aug 04, 2022
Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, Dongsheng Li, Kin-Man Lam

Figure 1 for Online Video Super-Resolution with Convolutional Kernel Bypass Graft
Figure 2 for Online Video Super-Resolution with Convolutional Kernel Bypass Graft
Figure 3 for Online Video Super-Resolution with Convolutional Kernel Bypass Graft
Figure 4 for Online Video Super-Resolution with Convolutional Kernel Bypass Graft

Deep learning-based models have achieved remarkable performance in video super-resolution (VSR) in recent years, but most of these models are less applicable to online video applications. These methods solely consider the distortion quality and ignore crucial requirements for online applications, e.g., low latency and low model complexity. In this paper, we focus on online video transmission, in which VSR algorithms are required to generate high-resolution video sequences frame by frame in real time. To address such challenges, we propose an extremely low-latency VSR algorithm based on a novel kernel knowledge transfer method, named convolutional kernel bypass graft (CKBG). First, we design a lightweight network structure that does not require future frames as inputs and saves extra time costs for caching these frames. Then, our proposed CKBG method enhances this lightweight base model by bypassing the original network with ``kernel grafts'', which are extra convolutional kernels containing the prior knowledge of external pretrained image SR models. In the testing phase, we further accelerate the grafted multi-branch network by converting it into a simple single-path structure. Experiment results show that our proposed method can process online video sequences up to 110 FPS, with very low model complexity and competitive SR performance.

Viaarxiv icon

Multi-scale Sampling and Aggregation Network For High Dynamic Range Imaging

Aug 04, 2022
Jun Xiao, Qian Ye, Tianshan Liu, Cong Zhang, Kin-Man Lam

Figure 1 for Multi-scale Sampling and Aggregation Network For High Dynamic Range Imaging
Figure 2 for Multi-scale Sampling and Aggregation Network For High Dynamic Range Imaging
Figure 3 for Multi-scale Sampling and Aggregation Network For High Dynamic Range Imaging
Figure 4 for Multi-scale Sampling and Aggregation Network For High Dynamic Range Imaging

High dynamic range (HDR) imaging is a fundamental problem in image processing, which aims to generate well-exposed images, even in the presence of varying illumination in the scenes. In recent years, multi-exposure fusion methods have achieved remarkable results, which merge multiple low dynamic range (LDR) images, captured with different exposures, to generate corresponding HDR images. However, synthesizing HDR images in dynamic scenes is still challenging and in high demand. There are two challenges in producing HDR images: 1). Object motion between LDR images can easily cause undesirable ghosting artifacts in the generated results. 2). Under and overexposed regions often contain distorted image content, because of insufficient compensation for these regions in the merging stage. In this paper, we propose a multi-scale sampling and aggregation network for HDR imaging in dynamic scenes. To effectively alleviate the problems caused by small and large motions, our method implicitly aligns LDR images by sampling and aggregating high-correspondence features in a coarse-to-fine manner. Furthermore, we propose a densely connected network based on discrete wavelet transform for performance improvement, which decomposes the input into several non-overlapping frequency subbands and adaptively performs compensation in the wavelet domain. Experiments show that our proposed method can achieve state-of-the-art performances under diverse scenes, compared to other promising HDR imaging methods. In addition, the HDR images generated by our method contain cleaner and more detailed content, with fewer distortions, leading to better visual quality.

Viaarxiv icon

Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark

Jan 03, 2022
Parnian Afshar, Arash Mohammadi, Konstantinos N. Plataniotis, Keyvan Farahani, Justin Kirby, Anastasia Oikonomou, Amir Asif, Leonard Wee, Andre Dekker, Xin Wu, Mohammad Ariful Haque, Shahruk Hossain, Md. Kamrul Hasan, Uday Kamal, Winston Hsu, Jhih-Yuan Lin, M. Sohel Rahman, Nabil Ibtehaz, Sh. M. Amir Foisol, Kin-Man Lam, Zhong Guang, Runze Zhang, Sumohana S. Channappayya, Shashank Gupta, Chander Dev

Figure 1 for Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark
Figure 2 for Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark
Figure 3 for Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark
Figure 4 for Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark

Lung cancer is one of the deadliest cancers, and in part its effective diagnosis and treatment depend on the accurate delineation of the tumor. Human-centered segmentation, which is currently the most common approach, is subject to inter-observer variability, and is also time-consuming, considering the fact that only experts are capable of providing annotations. Automatic and semi-automatic tumor segmentation methods have recently shown promising results. However, as different researchers have validated their algorithms using various datasets and performance metrics, reliably evaluating these methods is still an open challenge. The goal of the Lung-Originated Tumor Segmentation from Computed Tomography Scan (LOTUS) Benchmark created through 2018 IEEE Video and Image Processing (VIP) Cup competition, is to provide a unique dataset and pre-defined metrics, so that different researchers can develop and evaluate their methods in a unified fashion. The 2018 VIP Cup started with a global engagement from 42 countries to access the competition data. At the registration stage, there were 129 members clustered into 28 teams from 10 countries, out of which 9 teams made it to the final stage and 6 teams successfully completed all the required tasks. In a nutshell, all the algorithms proposed during the competition, are based on deep learning models combined with a false positive reduction technique. Methods developed by the three finalists show promising results in tumor segmentation, however, more effort should be put into reducing the false positive rate. This competition manuscript presents an overview of the VIP-Cup challenge, along with the proposed algorithms and results.

Viaarxiv icon

NTGAN: Learning Blind Image Denoising without Clean Reference

Sep 09, 2020
Rui Zhao, Daniel P. K. Lun, Kin-Man Lam

Figure 1 for NTGAN: Learning Blind Image Denoising without Clean Reference
Figure 2 for NTGAN: Learning Blind Image Denoising without Clean Reference
Figure 3 for NTGAN: Learning Blind Image Denoising without Clean Reference

Recent studies on learning-based image denoising have achieved promising performance on various noise reduction tasks. Most of these deep denoisers are trained either under the supervision of clean references, or unsupervised on synthetic noise. The assumption with the synthetic noise leads to poor generalization when facing real photographs. To address this issue, we propose a novel deep unsupervised image-denoising method by regarding the noise reduction task as a special case of the noise transference task. Learning noise transference enables the network to acquire the denoising ability by only observing the corrupted samples. The results on real-world denoising benchmarks demonstrate that our proposed method achieves state-of-the-art performance on removing realistic noises, making it a potential solution to practical noise reduction problems.

* BMVC 2020 
Viaarxiv icon

Deep Multi-task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

Jul 09, 2020
Rui Zhao, Tianshan Liu, Jun Xiao, Daniel P. K. Lun, Kin-Man Lam

Figure 1 for Deep Multi-task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing
Figure 2 for Deep Multi-task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing
Figure 3 for Deep Multi-task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing
Figure 4 for Deep Multi-task Learning for Facial Expression Recognition and Synthesis Based on Selective Feature Sharing

Multi-task learning is an effective learning strategy for deep-learning-based facial expression recognition tasks. However, most existing methods take into limited consideration the feature selection, when transferring information between different tasks, which may lead to task interference when training the multi-task networks. To address this problem, we propose a novel selective feature-sharing method, and establish a multi-task network for facial expression recognition and facial expression synthesis. The proposed method can effectively transfer beneficial features between different tasks, while filtering out useless and harmful information. Moreover, we employ the facial expression synthesis task to enlarge and balance the training dataset to further enhance the generalization ability of the proposed method. Experimental results show that the proposed method achieves state-of-the-art performance on those commonly used facial expression recognition benchmarks, which makes it a potential solution to real-world facial expression recognition problems.

* ICPR 2020 
Viaarxiv icon

Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis

Jun 28, 2020
Rui Zhao, Kin-Man Lam, Daniel P. K. Lun

Figure 1 for Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis
Figure 2 for Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis
Figure 3 for Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis
Figure 4 for Enhancement of a CNN-Based Denoiser Based on Spatial and Spectral Analysis

Convolutional neural network (CNN)-based image denoising methods have been widely studied recently, because of their high-speed processing capability and good visual quality. However, most of the existing CNN-based denoisers learn the image prior from the spatial domain, and suffer from the problem of spatially variant noise, which limits their performance in real-world image denoising tasks. In this paper, we propose a discrete wavelet denoising CNN (WDnCNN), which restores images corrupted by various noise with a single model. Since most of the content or energy of natural images resides in the low-frequency spectrum, their transformed coefficients in the frequency domain are highly imbalanced. To address this issue, we present a band normalization module (BNM) to normalize the coefficients from different parts of the frequency spectrum. Moreover, we employ a band discriminative training (BDT) criterion to enhance the model regression. We evaluate the proposed WDnCNN, and compare it with other state-of-the-art denoisers. Experimental results show that WDnCNN achieves promising performance in both synthetic and real noise reduction, making it a potential solution to many practical image denoising applications.

* ICIP 2019 
Viaarxiv icon