Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lu Wang

Shandong University

Multi-Scale Representation Learning for Image Restoration with State-Space Model

Aug 19, 2024

Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, Lu Wang

Figure 1 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 2 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 3 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Figure 4 for Multi-Scale Representation Learning for Image Restoration with State-Space Model

Abstract:Image restoration endeavors to reconstruct a high-quality, detail-rich image from a degraded counterpart, which is a pivotal process in photography and various computer vision systems. In real-world scenarios, different types of degradation can cause the loss of image details at various scales and degrade image contrast. Existing methods predominantly rely on CNN and Transformer to capture multi-scale representations. However, these methods are often limited by the high computational complexity of Transformers and the constrained receptive field of CNN, which hinder them from achieving superior performance and efficiency in image restoration. To address these challenges, we propose a novel Multi-Scale State-Space Model-based (MS-Mamba) for efficient image restoration that enhances the capacity for multi-scale representation learning through our proposed global and regional SSM modules. Additionally, an Adaptive Gradient Block (AGB) and a Residual Fourier Block (RFB) are proposed to improve the network's detail extraction capabilities by capturing gradients in various directions and facilitating learning details in the frequency domain. Extensive experiments on nine public benchmarks across four classic image restoration tasks, image deraining, dehazing, denoising, and low-light enhancement, demonstrate that our proposed method achieves new state-of-the-art performance while maintaining low computational complexity. The source code will be publicly available.

Via

Access Paper or Ask Questions

Optimizing NOMA Transmissions to Advance Federated Learning in Vehicular Networks

Aug 06, 2024

Ziru Chen, Zhou Ni, Peiyuan Guan, Lu Wang, Lin X. Cai, Morteza Hashemi, Zongzhi Li

Figure 1 for Optimizing NOMA Transmissions to Advance Federated Learning in Vehicular Networks

Figure 2 for Optimizing NOMA Transmissions to Advance Federated Learning in Vehicular Networks

Figure 3 for Optimizing NOMA Transmissions to Advance Federated Learning in Vehicular Networks

Figure 4 for Optimizing NOMA Transmissions to Advance Federated Learning in Vehicular Networks

Abstract:Diverse critical data, such as location information and driving patterns, can be collected by IoT devices in vehicular networks to improve driving experiences and road safety. However, drivers are often reluctant to share their data due to privacy concerns. The Federated Vehicular Network (FVN) is a promising technology that tackles these concerns by transmitting model parameters instead of raw data, thereby protecting the privacy of drivers. Nevertheless, the performance of Federated Learning (FL) in a vehicular network depends on the joining ratio, which is restricted by the limited available wireless resources. To address these challenges, this paper proposes to apply Non-Orthogonal Multiple Access (NOMA) to improve the joining ratio in a FVN. Specifically, a vehicle selection and transmission power control algorithm is developed to exploit the power domain differences in the received signal to ensure the maximum number of vehicles capable of joining the FVN. Our simulation results demonstrate that the proposed NOMA-based strategy increases the joining ratio and significantly enhances the performance of the FVN.

* The paper is accepted by IEEE Globecom 2024

Via

Access Paper or Ask Questions

Gaussian Mixture based Evidential Learning for Stereo Matching

Aug 05, 2024

Weide Liu, Xingxing Wang, Lu Wang, Jun Cheng, Fayao Liu, Xulei Yang

Figure 1 for Gaussian Mixture based Evidential Learning for Stereo Matching

Figure 2 for Gaussian Mixture based Evidential Learning for Stereo Matching

Figure 3 for Gaussian Mixture based Evidential Learning for Stereo Matching

Figure 4 for Gaussian Mixture based Evidential Learning for Stereo Matching

Abstract:In this paper, we introduce a novel Gaussian mixture based evidential learning solution for robust stereo matching. Diverging from previous evidential deep learning approaches that rely on a single Gaussian distribution, our framework posits that individual image data adheres to a mixture-of-Gaussian distribution in stereo matching. This assumption yields more precise pixel-level predictions and more accurately mirrors the real-world image distribution. By further employing the inverse-Gamma distribution as an intermediary prior for each mixture component, our probabilistic model achieves improved depth estimation compared to its counterpart with the single Gaussian and effectively captures the model uncertainty, which enables a strong cross-domain generation ability. We evaluated our method for stereo matching by training the model using the Scene Flow dataset and testing it on KITTI 2015 and Middlebury 2014. The experiment results consistently show that our method brings improvements over the baseline methods in a trustworthy manner. Notably, our approach achieved new state-of-the-art results on both the in-domain validated data and the cross-domain datasets, demonstrating its effectiveness and robustness in stereo matching tasks.

Via

Access Paper or Ask Questions

Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

Jul 31, 2024

Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

Figure 1 for Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

Figure 2 for Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

Figure 3 for Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

Figure 4 for Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

Abstract:Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step reasoning perspective, which is a critical aspect in the conditioned, editable, and interactive human-computer co-creation process. This study conducts a thorough investigation of LLMs' capability and limitations in symbolic music processing. We identify that current LLMs exhibit poor performance in song-level multi-step music reasoning, and typically fail to leverage learned music knowledge when addressing complex musical tasks. An analysis of LLMs' responses highlights distinctly their pros and cons. Our findings suggest achieving advanced musical capability is not intrinsically obtained by LLMs, and future research should focus more on bridging the gap between music knowledge and reasoning, to improve the co-creation experience for musicians.

* Accepted by ISMIR2024

Via

Access Paper or Ask Questions

Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Jul 10, 2024

Ruolin Li, Lu Wang, Tingting Yang, Lisheng Xu, Bingyang Ma, Yongchun Li, Hongchao Wei

Figure 1 for Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Figure 2 for Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Figure 3 for Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Figure 4 for Micro-Expression Recognition by Motion Feature Extraction based on Pre-training

Abstract:Micro-expressions (MEs) are spontaneous, unconscious facial expressions that have promising applications in various fields such as psychotherapy and national security. Thus, micro-expression recognition (MER) has attracted more and more attention from researchers. Although various MER methods have emerged especially with the development of deep learning techniques, the task still faces several challenges, e.g. subtle motion and limited training data. To address these problems, we propose a novel motion extraction strategy (MoExt) for the MER task and use additional macro-expression data in the pre-training process. We primarily pretrain the feature separator and motion extractor using the contrastive loss, thus enabling them to extract representative motion features. In MoExt, shape features and texture features are first extracted separately from onset and apex frames, and then motion features related to MEs are extracted based on the shape features of both frames. To enable the model to more effectively separate features, we utilize the extracted motion features and the texture features from the onset frame to reconstruct the apex frame. Through pre-training, the module is enabled to extract inter-frame motion features of facial expressions while excluding irrelevant information. The feature separator and motion extractor are ultimately integrated into the MER network, which is then fine-tuned using the target ME data. The effectiveness of proposed method is validated on three commonly used datasets, i.e., CASME II, SMIC, SAMM, and CAS(ME)3 dataset. The results show that our method performs favorably against state-of-the-art methods.

Via

Access Paper or Ask Questions

AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

Jun 27, 2024

Jia Fu, Xiaoting Qin, Fangkai Yang, Lu Wang, Jue Zhang, Qingwei Lin, Yubo Chen, Dongmei Zhang, Saravan Rajmohan, Qi Zhang

Figure 1 for AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

Figure 2 for AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

Figure 3 for AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

Figure 4 for AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation

Abstract:Recent advancements in Large Language Models have transformed ML/AI development, necessitating a reevaluation of AutoML principles for the Retrieval-Augmented Generation (RAG) systems. To address the challenges of hyper-parameter optimization and online adaptation in RAG, we propose the AutoRAG-HP framework, which formulates the hyper-parameter tuning as an online multi-armed bandit (MAB) problem and introduces a novel two-level Hierarchical MAB (Hier-MAB) method for efficient exploration of large search spaces. We conduct extensive experiments on tuning hyper-parameters, such as top-k retrieved documents, prompt compression ratio, and embedding methods, using the ALCE-ASQA and Natural Questions datasets. Our evaluation from jointly optimization all three hyper-parameters demonstrate that MAB-based online learning methods can achieve Recall@5 $\approx 0.8$ for scenarios with prominent gradients in search space, using only $\sim20\%$ of the LLM API calls required by the Grid Search approach. Additionally, the proposed Hier-MAB approach outperforms other baselines in more challenging optimization scenarios. The code will be made available at https://aka.ms/autorag.

Via

Access Paper or Ask Questions

A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Jun 20, 2024

Taixia Shi, Dingding Liang, Lu Wang, Lin Li, Shaogang Guo, Jiawei Gao, Xiaowei Li, Chulun Lin, Lei Shi, Baogang Ding(+4 more)

Figure 1 for A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Figure 2 for A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Figure 3 for A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Figure 4 for A microwave photonic prototype for concurrent radar detection and spectrum sensing over an 8 to 40 GHz bandwidth

Abstract:In this work, a microwave photonic prototype for concurrent radar detection and spectrum sensing is proposed, designed, built, and investigated. A direct digital synthesizer and an analog electronic circuit are integrated to generate an intermediate frequency (IF) linearly frequency-modulated (LFM) signal with a tunable center frequency from 2.5 to 9.5 GHz and an instantaneous bandwidth of 1 GHz. The IF LFM signal is converted to the optical domain via an intensity modulator and then filtered by a fiber Bragg grating (FBG) to generate only two 2nd-order optical LFM sidebands. In radar detection, the two optical LFM sidebands beat with each other to generate a frequency-and-bandwidth-quadrupled LFM signal, which is used for ranging, radial velocity measurement, and imaging. By changing the center frequency of the IF LFM signal, the radar function can be operated within 8 to 40 GHz. In spectrum sensing, one 2nd-order optical LFM sideband is selected by another FBG, which then works in conjunction with the stimulated Brillouin scattering gain spectrum to map the frequency of the signal under test to time with an instantaneous measurement bandwidth of 2 GHz. By using a frequency shift module to adjust the pump frequency, the frequency measurement range can be adjusted from 0 to 40 GHz. The prototype is comprehensively studied and tested, which is capable of achieving a range resolution of 3.75 cm, a range error of less than $\pm$ 2 cm, a radial velocity error within $\pm$ 1 cm/s, delivering clear imaging of multiple small targets, and maintaining a frequency measurement error of less than $\pm$ 7 MHz and a frequency resolution of better than 20 MHz.

* 18 pages, 12 figures, 1 table

Via

Access Paper or Ask Questions

Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

Jun 19, 2024

Kaikai An, Fangkai Yang, Liqun Li, Junting Lu, Sitao Cheng, Lu Wang, Pu Zhao, Lele Cao, Qingwei Lin, Saravan Rajmohan(+2 more)

Figure 1 for Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

Figure 2 for Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

Figure 3 for Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

Figure 4 for Thread: A Logic-Based Data Organization Paradigm for How-To Question Answering with Retrieval Augmented Generation

Abstract:Current question answering systems leveraging retrieval augmented generation perform well in answering factoid questions but face challenges with non-factoid questions, particularly how-to queries requiring detailed step-by-step instructions and explanations. In this paper, we introduce Thread, a novel data organization paradigm that transforms documents into logic units based on their inter-connectivity. Extensive experiments across open-domain and industrial scenarios demonstrate that Thread outperforms existing data organization paradigms in RAG-based QA systems, significantly improving the handling of how-to questions.

* 21 pages, 4 figures

Via

Access Paper or Ask Questions

Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

Jun 19, 2024

Xin Liu, Farima Fatahi Bayat, Lu Wang

Abstract:Calibrating language models (LMs) aligns their generation confidence with the actual likelihood of answer correctness, which can inform users about LMs' reliability and mitigate hallucinated content. However, prior calibration methods, such as self-consistency-based and logit-based approaches, are either limited in inference-time efficiency or fall short of providing informative signals. Moreover, simply filtering out low-confidence responses reduces the LM's helpfulness when the answers are correct. Therefore, effectively using calibration techniques to enhance an LM's factuality remains an unsolved challenge. In this paper, we first propose an activation-based calibration method, ActCab, which trains a linear layer on top of the LM's last-layer activations that can better capture the representations of knowledge. Built on top of ActCab, we further propose CoDec, a confidence-guided decoding strategy to elicit truthful answers with high confidence from LMs. By evaluating on five popular QA benchmarks, ActCab achieves superior calibration performance than all competitive baselines, e.g., by reducing the average expected calibration error (ECE) score by up to 39%. Further experiments on CoDec show consistent improvements in several LMs' factuality on challenging QA datasets, such as TruthfulQA, highlighting the value of confidence signals in enhancing factuality.

Via

Access Paper or Ask Questions

Verifiable Generation with Subsentence-Level Fine-Grained Citations

Jun 10, 2024

Shuyang Cao, Lu Wang

Figure 1 for Verifiable Generation with Subsentence-Level Fine-Grained Citations

Figure 2 for Verifiable Generation with Subsentence-Level Fine-Grained Citations

Figure 3 for Verifiable Generation with Subsentence-Level Fine-Grained Citations

Figure 4 for Verifiable Generation with Subsentence-Level Fine-Grained Citations

Abstract:Verifiable generation requires large language models (LLMs) to cite source documents supporting their outputs, thereby improve output transparency and trustworthiness. Yet, previous work mainly targets the generation of sentence-level citations, lacking specificity about which parts of a sentence are backed by the cited sources. This work studies verifiable generation with subsentence-level fine-grained citations for more precise location of generated content supported by the cited sources. We first present a dataset, SCiFi, comprising 10K Wikipedia paragraphs with subsentence-level citations. Each paragraph is paired with a set of candidate source documents for citation and a query that triggers the generation of the paragraph content. On SCiFi, we evaluate the performance of state-of-the-art LLMs and strategies for processing long documents designed for these models. Our experiment results reveals key factors that could enhance the quality of citations, including the expansion of the source documents' context accessible to the models and the implementation of specialized model tuning.

* NAACL 2024 Findings

Via

Access Paper or Ask Questions