Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hong Liu

School of Electrical and Computer Engineering, University of Oklahoma, Norman, OK, USA

Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

Sep 15, 2023

Yiming Li, Xiangdong Wang, Hong Liu, Rui Tao, Long Yan, Kazushige Ouchi

Figure 1 for Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

Figure 2 for Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

Figure 3 for Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

Figure 4 for Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

Abstract:Learning meaningful frame-wise features on a partially labeled dataset is crucial to semi-supervised sound event detection. Prior works either maintain consistency on frame-level predictions or seek feature-level similarity among neighboring frames, which cannot exploit the potential of unlabeled data. In this work, we design a Local and Global Consistency (LGC) regularization scheme to enhance the model on both label- and feature-level. The audio CutMix is introduced to change the contextual information of clips. Then, the local consistency is adopted to encourage the model to leverage local features for frame-level predictions, and the global consistency is applied to force features to align with global prototypes through a specially designed contrastive loss. Experiments on the DESED dataset indicate the superiority of LGC, surpassing its respective competitors largely with the same settings as the baseline system. Besides, combining LGC with existing methods can obtain further improvements. The code will be released soon.

* submitted to ICASSP 2024

Via

Access Paper or Ask Questions

Audio-free Prompt Tuning for Language-Audio Models

Sep 15, 2023

Yiming Li, Xiangdong Wang, Hong Liu

Figure 1 for Audio-free Prompt Tuning for Language-Audio Models

Figure 2 for Audio-free Prompt Tuning for Language-Audio Models

Figure 3 for Audio-free Prompt Tuning for Language-Audio Models

Figure 4 for Audio-free Prompt Tuning for Language-Audio Models

Abstract:Contrastive Language-Audio Pretraining (CLAP) is pre-trained to associate audio features with human language, making it a natural zero-shot classifier to recognize unseen sound categories. To adapt CLAP to downstream tasks, prior works inevitably require labeled domain audios, which limits their scalability under data scarcity and deprives them of the capability to detect novel classes as the original CLAP. In this work, by leveraging the modality alignment in CLAP, we propose an efficient audio-free prompt tuning scheme aimed at optimizing a few prompt tokens from texts instead of audios, which regularizes the model space to avoid overfitting the seen classes as well. Based on this, a multi-grained prompt design is further explored to fuse global and local information. Experiments on several tasks demonstrate that our approach can boost the CLAP and outperform other training methods on model performance and training efficiency. While conducting zero-shot inference on unseen categories, it still shows better transferability than the vanilla CLAP. Moreover, our method is flexible enough even if only knowing the downstream class names. The code will be released soon.

* submitted to ICASSP 2024

Via

Access Paper or Ask Questions

An Efficient Trajectory Planner for Car-like Robots on Uneven Terrain

Sep 12, 2023

Long Xu, Kaixin Chai, Zhichao Han, Hong Liu, Chao Xu, Yanjun Cao, Fei Gao

Figure 1 for An Efficient Trajectory Planner for Car-like Robots on Uneven Terrain

Figure 2 for An Efficient Trajectory Planner for Car-like Robots on Uneven Terrain

Figure 3 for An Efficient Trajectory Planner for Car-like Robots on Uneven Terrain

Figure 4 for An Efficient Trajectory Planner for Car-like Robots on Uneven Terrain

Abstract:Autonomous navigation of ground robots on uneven terrain is being considered in more and more tasks. However, uneven terrain will bring two problems to motion planning: how to assess the traversability of the terrain and how to cope with the dynamics model of the robot associated with the terrain. The trajectories generated by existing methods are often too conservative or cannot be tracked well by the controller since the second problem is not well solved. In this paper, we propose terrain pose mapping to describe the impact of terrain on the robot. With this mapping, we can obtain the SE(3) state of the robot on uneven terrain for a given state in SE(2). Then, based on it, we present a trajectory optimization framework for car-like robots on uneven terrain that can consider both of the above problems. The trajectories generated by our method conform to the dynamics model of the system without being overly conservative and yet able to be tracked well by the controller. We perform simulations and real-world experiments to validate the efficiency and trajectory quality of our algorithm.

Via

Access Paper or Ask Questions

Semantic-aware Consistency Network for Cloth-changing Person Re-Identification

Aug 27, 2023

Peini Guo, Hong Liu, Jianbing Wu, Guoquan Wang, Tao Wang

Figure 1 for Semantic-aware Consistency Network for Cloth-changing Person Re-Identification

Figure 2 for Semantic-aware Consistency Network for Cloth-changing Person Re-Identification

Figure 3 for Semantic-aware Consistency Network for Cloth-changing Person Re-Identification

Figure 4 for Semantic-aware Consistency Network for Cloth-changing Person Re-Identification

Abstract:Cloth-changing Person Re-Identification (CC-ReID) is a challenging task that aims to retrieve the target person across multiple surveillance cameras when clothing changes might happen. Despite recent progress in CC-ReID, existing approaches are still hindered by the interference of clothing variations since they lack effective constraints to keep the model consistently focused on clothing-irrelevant regions. To address this issue, we present a Semantic-aware Consistency Network (SCNet) to learn identity-related semantic features by proposing effective consistency constraints. Specifically, we generate the black-clothing image by erasing pixels in the clothing area, which explicitly mitigates the interference from clothing variations. In addition, to fully exploit the fine-grained identity information, a head-enhanced attention module is introduced, which learns soft attention maps by utilizing the proposed part-based matching loss to highlight head information. We further design a semantic consistency loss to facilitate the learning of high-level identity-related semantic features, forcing the model to focus on semantically consistent cloth-irrelevant regions. By using the consistency constraint, our model does not require any extra auxiliary segmentation module to generate the black-clothing image or locate the head region during the inference stage. Extensive experiments on four cloth-changing person Re-ID datasets (LTCC, PRCC, Vc-Clothes, and DeepChange) demonstrate that our proposed SCNet makes significant improvements over prior state-of-the-art approaches. Our code is available at: https://github.com/Gpn-star/SCNet.

* Accepted by ACM MM 2023

Via

Access Paper or Ask Questions

Audio Generation with Multiple Conditional Diffusion Model

Aug 23, 2023

Zhifang Guo, Jianguo Mao, Rui Tao, Long Yan, Kazushige Ouchi, Hong Liu, Xiangdong Wang

Abstract:Text-based audio generation models have limitations as they cannot encompass all the information in audio, leading to restricted controllability when relying solely on text. To address this issue, we propose a novel model that enhances the controllability of existing pre-trained text-to-audio models by incorporating additional conditions including content (timestamp) and style (pitch contour and energy contour) as supplements to the text. This approach achieves fine-grained control over the temporal order, pitch, and energy of generated audio. To preserve the diversity of generation, we employ a trainable control condition encoder that is enhanced by a large language model and a trainable Fusion-Net to encode and fuse the additional conditions while keeping the weights of the pre-trained text-to-audio model frozen. Due to the lack of suitable datasets and evaluation metrics, we consolidate existing datasets into a new dataset comprising the audio and corresponding conditions and use a series of evaluation metrics to evaluate the controllability performance. Experimental results demonstrate that our model successfully achieves fine-grained control to accomplish controllable audio generation. Audio samples and our dataset are publicly available at https://conditionaudiogen.github.io/conditionaudiogen/

* Submitted to AAAI 2024

Via

Access Paper or Ask Questions

Furnishing Sound Event Detection with Language Model Abilities

Aug 22, 2023

Hualei Wang, Jianguo Mao, Zhifang Guo, Jiarui Wan, Hong Liu, Xiangdong Wang

Abstract:Recently, the ability of language models (LMs) has attracted increasing attention in visual cross-modality. In this paper, we further explore the generation capacity of LMs for sound event detection (SED), beyond the visual domain. Specifically, we propose an elegant method that aligns audio features and text features to accomplish sound event classification and temporal location. The framework consists of an acoustic encoder, a contrastive module that align the corresponding representations of the text and audio, and a decoupled language decoder that generates temporal and event sequences from the audio characteristic. Compared with conventional works that require complicated processing and barely utilize limited audio features, our model is more concise and comprehensive since language model directly leverage its semantic capabilities to generate the sequences. We investigate different decoupling modules to demonstrate the effectiveness for timestamps capture and event classification. Evaluation results show that the proposed method achieves accurate sequences of sound event detection.

* 8 pages,2 figures,published to AAAI

Via

Access Paper or Ask Questions

Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video

Aug 20, 2023

Yingxuan You, Hong Liu, Ti Wang, Wenhao Li, Runwei Ding, Xia Li

Abstract:Despite significant progress in single image-based 3D human mesh recovery, accurately and smoothly recovering 3D human motion from a video remains challenging. Existing video-based methods generally recover human mesh by estimating the complex pose and shape parameters from coupled image features, whose high complexity and low representation ability often result in inconsistent pose motion and limited shape patterns. To alleviate this issue, we introduce 3D pose as the intermediary and propose a Pose and Mesh Co-Evolution network (PMCE) that decouples this task into two parts: 1) video-based 3D human pose estimation and 2) mesh vertices regression from the estimated 3D pose and temporal image feature. Specifically, we propose a two-stream encoder that estimates mid-frame 3D pose and extracts a temporal image feature from the input image sequence. In addition, we design a co-evolution decoder that performs pose and mesh interactions with the image-guided Adaptive Layer Normalization (AdaLN) to make pose and mesh fit the human body shape. Extensive experiments demonstrate that the proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency on three benchmark datasets: 3DPW, Human3.6M, and MPI-INF-3DHP. Our code is available at https://github.com/kasvii/PMCE.

* Accepted by ICCV 2023. Project page: https://kasvii.github.io/PMCE

Via

Access Paper or Ask Questions

GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration

Jul 29, 2023

Ke Feng, Dahai Liu, Yongxin Liu, Hong Liu, Houbing Song

Figure 1 for GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration

Figure 2 for GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration

Figure 3 for GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration

Figure 4 for GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration

Abstract:The current National Airspace System (NAS) is reaching capacity due to increased air traffic, and is based on outdated pre-tactical planning. This study proposes a more dynamic airspace configuration (DAC) approach that could increase throughput and accommodate fluctuating traffic, ideal for emergencies. The proposed approach constructs the airspace as a constraints-embedded graph, compresses its dimensions, and applies a spectral clustering-enabled adaptive algorithm to generate collaborative airport groups and evenly distribute workloads among them. Under various traffic conditions, our experiments demonstrate a 50\% reduction in workload imbalances. This research could ultimately form the basis for a recommendation system for optimized airspace configuration. Code available at https://github.com/KeFenge2022/GraphDAC.git

* Acceptted for publication by IEEE IRI'23

Via

Access Paper or Ask Questions

UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing

Jul 19, 2023

Qifang Zhao, Tianyu Li, Meng Du, Yu Jiang, Qinghui Sun, Zhongyao Wang, Hong Liu, Huan Xu

Figure 1 for UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing

Figure 2 for UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing

Figure 3 for UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing

Figure 4 for UniMatch: A Unified User-Item Matching Framework for the Multi-purpose Merchant Marketing

Abstract:When doing private domain marketing with cloud services, the merchants usually have to purchase different machine learning models for the multiple marketing purposes, leading to a very high cost. We present a unified user-item matching framework to simultaneously conduct item recommendation and user targeting with just one model. We empirically demonstrate that the above concurrent modeling is viable via modeling the user-item interaction matrix with the multinomial distribution, and propose a bidirectional bias-corrected NCE loss for the implementation. The proposed loss function guides the model to learn the user-item joint probability $p(u,i)$ instead of the conditional probability $p(i|u)$ or $p(u|i)$ through correcting both the users and items' biases caused by the in-batch negative sampling. In addition, our framework is model-agnostic enabling a flexible adaptation of different model architectures. Extensive experiments demonstrate that our framework results in significant performance gains in comparison with the state-of-the-art methods, with greatly reduced cost on computing resources and daily maintenance.

* ICDE2023

Via

Access Paper or Ask Questions

You've Got Two Teachers: Co-evolutionary Image and Report Distillation for Semi-supervised Anatomical Abnormality Detection in Chest X-ray

Jul 18, 2023

Jinghan Sun, Dong Wei, Zhe Xu, Donghuan Lu, Hong Liu, Liansheng Wang, Yefeng Zheng

Abstract:Chest X-ray (CXR) anatomical abnormality detection aims at localizing and characterising cardiopulmonary radiological findings in the radiographs, which can expedite clinical workflow and reduce observational oversights. Most existing methods attempted this task in either fully supervised settings which demanded costly mass per-abnormality annotations, or weakly supervised settings which still lagged badly behind fully supervised methods in performance. In this work, we propose a co-evolutionary image and report distillation (CEIRD) framework, which approaches semi-supervised abnormality detection in CXR by grounding the visual detection results with text-classified abnormalities from paired radiology reports, and vice versa. Concretely, based on the classical teacher-student pseudo label distillation (TSD) paradigm, we additionally introduce an auxiliary report classification model, whose prediction is used for report-guided pseudo detection label refinement (RPDLR) in the primary vision detection task. Inversely, we also use the prediction of the vision detection model for abnormality-guided pseudo classification label refinement (APCLR) in the auxiliary report classification task, and propose a co-evolution strategy where the vision and report models mutually promote each other with RPDLR and APCLR performed alternatively. To this end, we effectively incorporate the weak supervision by reports into the semi-supervised TSD pipeline. Besides the cross-modal pseudo label refinement, we further propose an intra-image-modal self-adaptive non-maximum suppression, where the pseudo detection labels generated by the teacher vision model are dynamically rectified by high-confidence predictions by the student. Experimental results on the public MIMIC-CXR benchmark demonstrate CEIRD's superior performance to several up-to-date weakly and semi-supervised methods.

Via

Access Paper or Ask Questions