Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuxi Li

Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

Dec 18, 2023

Tianyao He, Huabin Liu, Yuxi Li, Xiao Ma, Cheng Zhong, Yang Zhang, Weiyao Lin

Figure 1 for Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

Figure 2 for Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

Figure 3 for Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

Figure 4 for Collaborative Weakly Supervised Video Correlation Learning for Procedure-Aware Instructional Video Analysis

Abstract:Video Correlation Learning (VCL), which aims to analyze the relationships between videos, has been widely studied and applied in various general video tasks. However, applying VCL to instructional videos is still quite challenging due to their intrinsic procedural temporal structure. Specifically, procedural knowledge is critical for accurate correlation analyses on instructional videos. Nevertheless, current procedure-learning methods heavily rely on step-level annotations, which are costly and not scalable. To address this problem, we introduce a weakly supervised framework called Collaborative Procedure Alignment (CPA) for procedure-aware correlation learning on instructional videos. Our framework comprises two core modules: collaborative step mining and frame-to-step alignment. The collaborative step mining module enables simultaneous and consistent step segmentation for paired videos, leveraging the semantic and temporal similarity between frames. Based on the identified steps, the frame-to-step alignment module performs alignment between the frames and steps across videos. The alignment result serves as a measurement of the correlation distance between two videos. We instantiate our framework in two distinct instructional video tasks: sequence verification and action quality assessment. Extensive experiments validate the effectiveness of our approach in providing accurate and interpretable correlation analyses for instructional videos.

* has been accepted by AAAI 24

Via

Access Paper or Ask Questions

Density Matters: Improved Core-set for Active Domain Adaptive Segmentation

Dec 15, 2023

Shizhan Liu, Zhengkai Jiang, Yuxi Li, Jinlong Peng, Yabiao Wang, Weiyao Lin

Figure 1 for Density Matters: Improved Core-set for Active Domain Adaptive Segmentation

Figure 2 for Density Matters: Improved Core-set for Active Domain Adaptive Segmentation

Figure 3 for Density Matters: Improved Core-set for Active Domain Adaptive Segmentation

Figure 4 for Density Matters: Improved Core-set for Active Domain Adaptive Segmentation

Abstract:Active domain adaptation has emerged as a solution to balance the expensive annotation cost and the performance of trained models in semantic segmentation. However, existing works usually ignore the correlation between selected samples and its local context in feature space, which leads to inferior usage of annotation budgets. In this work, we revisit the theoretical bound of the classical Core-set method and identify that the performance is closely related to the local sample distribution around selected samples. To estimate the density of local samples efficiently, we introduce a local proxy estimator with Dynamic Masked Convolution and develop a Density-aware Greedy algorithm to optimize the bound. Extensive experiments demonstrate the superiority of our approach. Moreover, with very few labels, our scheme achieves comparable performance to the fully supervised counterpart.

Via

Access Paper or Ask Questions

Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Dec 13, 2023

Yuxi Li, Hongzhi Jiang, Huijie Zhao, Xudong Li

Figure 1 for Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Figure 2 for Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Figure 3 for Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Figure 4 for Projective Parallel Single-Pixel Imaging: 3D Structured Light Scanning Under Global Illumination

Abstract:We present projective parallel single-pixel imaging (pPSI), a 3D photography method that provides a robust and efficient way to analyze the light transport behavior and enables separation of light effect due to global illumination, thereby achieving 3D structured light scanning under global illumination. The light transport behavior is described by the light transport coefficients (LTC), which contain complete information for a projector camera pair, and is a 4D data set. However, the capture of LTC is generally time consuming. The 4D LTC in pPSI are reduced to projection functions, thereby enabling a highly efficient data capture process. We introduce the local maximum constraint, which provides constraint for the location of candidate correspondence matching points when projections are captured. Local slice extension (LSE) method is introduced to accelerate the capture of projection functions. Optimization is conducted for pPSI under several situations. The number of projection functions required for pPSI is optimized and the influence of capture ratio in LSE on the accuracy of the correspondence matching points is investigated. Discussions and experiments include two typical kinds of global illuminations: inter-reflections and subsurface scattering. The proposed method is validated with several challenging scenarios, and outperforms the state-of-the-art methods.

* 21 pages,13 figures

Via

Access Paper or Ask Questions

Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection

May 30, 2023

Supeng Wang, Yuxi Li, Ming Xie, Mingmin Chi, Yabiao Wang, Chengjie Wang, Wenbing Zhu

Figure 1 for Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection

Figure 2 for Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection

Figure 3 for Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection

Figure 4 for Align, Perturb and Decouple: Toward Better Leverage of Difference Information for RSI Change Detection

Abstract:Change detection is a widely adopted technique in remote sense imagery (RSI) analysis in the discovery of long-term geomorphic evolution. To highlight the areas of semantic changes, previous effort mostly pays attention to learning representative feature descriptors of a single image, while the difference information is either modeled with simple difference operations or implicitly embedded via feature interactions. Nevertheless, such difference modeling can be noisy since it suffers from non-semantic changes and lacks explicit guidance from image content or context. In this paper, we revisit the importance of feature difference for change detection in RSI, and propose a series of operations to fully exploit the difference information: Alignment, Perturbation and Decoupling (APD). Firstly, alignment leverages contextual similarity to compensate for the non-semantic difference in feature space. Next, a difference module trained with semantic-wise perturbation is adopted to learn more generalized change estimators, which reversely bootstraps feature extraction and prediction. Finally, a decoupled dual-decoder structure is designed to predict semantic changes in both content-aware and content-agnostic manners. Extensive experiments are conducted on benchmarks of LEVIR-CD, WHU-CD and DSIFN-CD, demonstrating our proposed operations bring significant improvement and achieve competitive results under similar comparative conditions. Code is available at https://github.com/wangsp1999/CD-Research/tree/main/openAPD

* To appear in IJCAI 2023

Via

Access Paper or Ask Questions

Hear to Segment: Unmixing the Audio to Guide the Semantic Segmentation

May 12, 2023

Yuhang Ling, Yuxi Li, Zhenye Gan, Jiangning Zhang, Mingmin Chi, Yabiao Wang

Figure 1 for Hear to Segment: Unmixing the Audio to Guide the Semantic Segmentation

Figure 2 for Hear to Segment: Unmixing the Audio to Guide the Semantic Segmentation

Figure 3 for Hear to Segment: Unmixing the Audio to Guide the Semantic Segmentation

Figure 4 for Hear to Segment: Unmixing the Audio to Guide the Semantic Segmentation

Abstract:In this paper, we focus on a recently proposed novel task called Audio-Visual Segmentation (AVS), where the fine-grained correspondence between audio stream and image pixels is required to be established. However, learning such correspondence faces two key challenges: (1) audio signals inherently exhibit a high degree of information density, as sounds produced by multiple objects are entangled within the same audio stream; (2) the frequency of audio signals from objects with the same category tends to be similar, which hampers the distinction of target object and consequently leads to ambiguous segmentation results. Toward this end, we propose an Audio Unmixing and Semantic Segmentation Network (AUSS), which encourages unmixing complicated audio signals and distinguishing similar sounds. Technically, our AUSS unmixs the audio signals into a set of audio queries, and interacts them with visual features by masked attention mechanisms. To encourage these audio queries to capture distinctive features embedded within the audio, two self-supervised losses are also introduced as additional supervision at both class and mask levels. Extensive experimental results on the AVSBench benchmark show that our AUSS sets a new state-of-the-art in both single-source and multi-source subsets, demonstrating the effectiveness of our AUSS in bridging the gap between audio and vision modalities.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

May 10, 2023

Huabin Liu, Weiyao Lin, Tieyuan Chen, Yuxi Li, Shuyuan Li, John See

Figure 1 for Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

Figure 2 for Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

Figure 3 for Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

Figure 4 for Few-shot Action Recognition via Intra- and Inter-Video Information Maximization

Abstract:Current few-shot action recognition involves two primary sources of information for classification:(1) intra-video information, determined by frame content within a single video clip, and (2) inter-video information, measured by relationships (e.g., feature similarity) among videos. However, existing methods inadequately exploit these two information sources. In terms of intra-video information, current sampling operations for input videos may omit critical action information, reducing the utilization efficiency of video data. For the inter-video information, the action misalignment among videos makes it challenging to calculate precise relationships. Moreover, how to jointly consider both inter- and intra-video information remains under-explored for few-shot action recognition. To this end, we propose a novel framework, Video Information Maximization (VIM), for few-shot video action recognition. VIM is equipped with an adaptive spatial-temporal video sampler and a spatiotemporal action alignment model to maximize intra- and inter-video information, respectively. The video sampler adaptively selects important frames and amplifies critical spatial regions for each input video based on the task at hand. This preserves and emphasizes informative parts of video clips while eliminating interference at the data level. The alignment model performs temporal and spatial action alignment sequentially at the feature level, leading to more precise measurements of inter-video similarity. Finally, These goals are facilitated by incorporating additional loss terms based on mutual information measurement. Consequently, VIM acts to maximize the distinctiveness of video information from limited video data. Extensive experimental results on public datasets for few-shot action recognition demonstrate the effectiveness and benefits of our framework.

* arXiv admin note: text overlap with arXiv:2207.09759

Via

Access Paper or Ask Questions

Learning from Noisy Labels with Decoupled Meta Label Purifier

Feb 17, 2023

Yuanpeng Tu, Boshen Zhang, Yuxi Li, Liang Liu, Jian Li, Yabiao Wang, Chengjie Wang, Cai Rong Zhao

Figure 1 for Learning from Noisy Labels with Decoupled Meta Label Purifier

Figure 2 for Learning from Noisy Labels with Decoupled Meta Label Purifier

Figure 3 for Learning from Noisy Labels with Decoupled Meta Label Purifier

Figure 4 for Learning from Noisy Labels with Decoupled Meta Label Purifier

Abstract:Training deep neural networks(DNN) with noisy labels is challenging since DNN can easily memorize inaccurate labels, leading to poor generalization ability. Recently, the meta-learning based label correction strategy is widely adopted to tackle this problem via identifying and correcting potential noisy labels with the help of a small set of clean validation data. Although training with purified labels can effectively improve performance, solving the meta-learning problem inevitably involves a nested loop of bi-level optimization between model weights and hyper-parameters (i.e., label distribution). As compromise, previous methods resort to a coupled learning process with alternating update. In this paper, we empirically find such simultaneous optimization over both model weights and label distribution can not achieve an optimal routine, consequently limiting the representation ability of backbone and accuracy of corrected labels. From this observation, a novel multi-stage label purifier named DMLP is proposed. DMLP decouples the label correction process into label-free representation learning and a simple meta label purifier. In this way, DMLP can focus on extracting discriminative feature and label correction in two distinctive stages. DMLP is a plug-and-play label purifier, the purified labels can be directly reused in naive end-to-end network retraining or other robust learning methods, where state-of-the-art results are obtained on several synthetic and real-world noisy datasets, especially under high noise levels.

Via

Access Paper or Ask Questions

Learning with Noisy labels via Self-supervised Adversarial Noisy Masking

Feb 15, 2023

Yuanpeng Tu, Boshen Zhang, Yuxi Li, Liang Liu, Jian Li, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cai Rong Zhao

Abstract:Collecting large-scale datasets is crucial for training deep models, annotating the data, however, inevitably yields noisy labels, which poses challenges to deep learning algorithms. Previous efforts tend to mitigate this problem via identifying and removing noisy samples or correcting their labels according to the statistical properties (e.g., loss values) among training samples. In this paper, we aim to tackle this problem from a new perspective, delving into the deep feature maps, we empirically find that models trained with clean and mislabeled samples manifest distinguishable activation feature distributions. From this observation, a novel robust training approach termed adversarial noisy masking is proposed. The idea is to regularize deep features with a label quality guided masking scheme, which adaptively modulates the input data and label simultaneously, preventing the model to overfit noisy samples. Further, an auxiliary task is designed to reconstruct input data, it naturally provides noise-free self-supervised signals to reinforce the generalization ability of deep models. The proposed method is simple and flexible, it is tested on both synthetic and real-world noisy datasets, where significant improvements are achieved over previous state-of-the-art methods.

Via

Access Paper or Ask Questions

Self-supervised Likelihood Estimation with Energy Guidance for Anomaly Segmentation in Urban Scenes

Feb 15, 2023

Yuanpeng Tu, Yuxi Li, Boshen Zhang, Liang Liu, Jiangning Zhang, Yabiao Wang, Chengjie Wang, Cai Rong Zhao

Abstract:Robust autonomous driving requires agents to accurately identify unexpected areas in urban scenes. To this end, some critical issues remain open: how to design advisable metric to measure anomalies, and how to properly generate training samples of anomaly data? Previous effort usually resorts to uncertainty estimation and sample synthesis from classification tasks, which ignore the context information and sometimes requires auxiliary datasets with fine-grained annotations. On the contrary, in this paper, we exploit the strong context-dependent nature of segmentation task and design an energy-guided self-supervised frameworks for anomaly segmentation, which optimizes an anomaly head by maximizing the likelihood of self-generated anomaly pixels. To this end, we design two estimators for anomaly likelihood estimation, one is a simple task-agnostic binary estimator and the other depicts anomaly likelihood as residual of task-oriented energy model. Based on proposed estimators, we further incorporate our framework with likelihood-guided mask refinement process to extract informative anomaly pixels for model training. We conduct extensive experiments on challenging Fishyscapes and Road Anomaly benchmarks, demonstrating that without any auxiliary data or synthetic models, our method can still achieves competitive performance to other SOTA schemes.

Via

Access Paper or Ask Questions

Rethinking the Metric in Few-shot Learning: From an Adaptive Multi-Distance Perspective

Nov 02, 2022

Jinxiang Lai, Siqian Yang, Guannan Jiang, Xi Wang, Yuxi Li, Zihui Jia, Xiaochen Chen, Jun Liu, Bin-Bin Gao, Wei Zhang(+2 more)

Figure 1 for Rethinking the Metric in Few-shot Learning: From an Adaptive Multi-Distance Perspective

Figure 2 for Rethinking the Metric in Few-shot Learning: From an Adaptive Multi-Distance Perspective

Figure 3 for Rethinking the Metric in Few-shot Learning: From an Adaptive Multi-Distance Perspective

Figure 4 for Rethinking the Metric in Few-shot Learning: From an Adaptive Multi-Distance Perspective

Abstract:Few-shot learning problem focuses on recognizing unseen classes given a few labeled images. In recent effort, more attention is paid to fine-grained feature embedding, ignoring the relationship among different distance metrics. In this paper, for the first time, we investigate the contributions of different distance metrics, and propose an adaptive fusion scheme, bringing significant improvements in few-shot classification. We start from a naive baseline of confidence summation and demonstrate the necessity of exploiting the complementary property of different distance metrics. By finding the competition problem among them, built upon the baseline, we propose an Adaptive Metrics Module (AMM) to decouple metrics fusion into metric-prediction fusion and metric-losses fusion. The former encourages mutual complementary, while the latter alleviates metric competition via multi-task collaborative learning. Based on AMM, we design a few-shot classification framework AMTNet, including the AMM and the Global Adaptive Loss (GAL), to jointly optimize the few-shot task and auxiliary self-supervised task, making the embedding features more robust. In the experiment, the proposed AMM achieves 2% higher performance than the naive metrics fusion module, and our AMTNet outperforms the state-of-the-arts on multiple benchmark datasets.

* Proceedings of the 30th ACM International Conference on Multimedia 2022

Via

Access Paper or Ask Questions