Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bin Feng

Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Jul 17, 2024

Haijun Xiong, Bin Feng, Xinggang Wang, Wenyu Liu

Figure 1 for Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Figure 2 for Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Figure 3 for Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Figure 4 for Causality-inspired Discriminative Feature Learning in Triple Domains for Gait Recognition

Abstract:Gait recognition is a biometric technology that distinguishes individuals by their walking patterns. However, previous methods face challenges when accurately extracting identity features because they often become entangled with non-identity clues. To address this challenge, we propose CLTD, a causality-inspired discriminative feature learning module designed to effectively eliminate the influence of confounders in triple domains, \ie, spatial, temporal, and spectral. Specifically, we utilize the Cross Pixel-wise Attention Generator (CPAG) to generate attention distributions for factual and counterfactual features in spatial and temporal domains. Then, we introduce the Fourier Projection Head (FPH) to project spatial features into the spectral space, which preserves essential information while reducing computational costs. Additionally, we employ an optimization method with contrastive learning to enforce semantic consistency constraints across sequences from the same subject. Our approach has demonstrated significant performance improvements on challenging datasets, proving its effectiveness. Moreover, it can be seamlessly integrated into existing gait recognition methods.

* Accepted by ECCV 2024

Via

Access Paper or Ask Questions

LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

Jun 18, 2024

Yunze Deng, Haijun Xiong, Bin Feng

Figure 1 for LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

Figure 2 for LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

Figure 3 for LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

Figure 4 for LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

Abstract:Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this paper, we introduce a novel modality-sensitive network LiCAF for LiDAR-camera fusion, which employs an asymmetric modeling strategy. Specifically, we propose Asymmetric Cross-modal Channel Attention (ACCA) and Interlaced Cross-modal Temporal Modeling (ICTM) for cross-modal valuable channel information selection and powerful temporal modeling. Our method achieves state-of-the-art performance (93.9% in Rank-1 and 98.8% in Rank-5) on the SUSTech1K dataset, demonstrating its effectiveness.

* Accepted by ICIP2024

Via

Access Paper or Ask Questions

Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Jun 13, 2024

Zhengqi Zhao, Xiaohu Huang, Hao Zhou, Kun Yao, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng

Figure 1 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Figure 2 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Figure 3 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Figure 4 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Abstract:The key to action counting is accurately locating each video's repetitive actions. Instead of estimating the probability of each frame belonging to an action directly, we propose a dual-branch network, i.e., SkimFocusNet, working in a two-step manner. The model draws inspiration from empirical observations indicating that humans typically engage in coarse skimming of entire sequences to grasp the general action pattern initially, followed by a finer, frame-by-frame focus to determine if it aligns with the target action. Specifically, SkimFocusNet incorporates a skim branch and a focus branch. The skim branch scans the global contextual information throughout the sequence to identify potential target action for guidance. Subsequently, the focus branch utilizes the guidance to diligently identify repetitive actions using a long-short adaptive guidance (LSAG) block. Additionally, we have observed that videos in existing datasets often feature only one type of repetitive action, which inadequately represents real-world scenarios. To more accurately describe real-life situations, we establish the Multi-RepCount dataset, which includes videos containing multiple repetitive motions. On Multi-RepCount, our SkimFoucsNet can perform specified action counting, that is, to enable counting a particular action type by referencing an exemplary video. This capability substantially exhibits the robustness of our method. Extensive experiments demonstrate that SkimFocusNet achieves state-of-the-art performances with significant improvements. We also conduct a thorough ablation study to evaluate the network components. The source code will be published upon acceptance.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM

Dec 06, 2023

Jiayi Pan, Chengcan Wang, Kaifu Zheng, Yangguang Li, Zhenyu Wang, Bin Feng

Abstract:Large language models (LLMs) have shown remarkable capabilities in various tasks. However their huge model size and the consequent demand for computational and memory resources also pose challenges to model deployment. Currently, 4-bit post-training quantization (PTQ) has achieved some success in LLMs, reducing the memory footprint by approximately 75% compared to FP16 models, albeit with some accuracy loss. In this paper, we propose SmoothQuant+, an accurate and efficient 4-bit weight-only PTQ that requires no additional training, which enables lossless in accuracy for LLMs for the first time. Based on the fact that the loss of weight quantization is amplified by the activation outliers, SmoothQuant+ smoothes the activation outliers by channel before quantization, while adjusting the corresponding weights for mathematical equivalence, and then performs group-wise 4-bit weight quantization for linear layers. We have integrated SmoothQuant+ into the vLLM framework, an advanced high-throughput inference engine specially developed for LLMs, and equipped it with an efficient W4A16 CUDA kernels, so that vLLM can seamlessly support SmoothQuant+ 4-bit weight quantization. Our results show that, with SmoothQuant+, the Code Llama-34B model can be quantized and deployed on a A100 40GB GPU, achieving lossless accuracy and a throughput increase of 1.9 to 4.0 times compared to the FP16 model deployed on two A100 40GB GPUs. Moreover, the latency per token is only 68% of the FP16 model deployed on two A100 40GB GPUs. This is the state-of-the-art 4-bit weight quantization for LLMs as we know.

Via

Access Paper or Ask Questions

Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition

Aug 13, 2023

Xiaohu Huang, Xinggang Wang, Zhidianqiu Jin, Bo Yang, Botao He, Bin Feng, Wenyu Liu

Figure 1 for Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition

Figure 2 for Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition

Figure 3 for Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition

Figure 4 for Condition-Adaptive Graph Convolution Learning for Skeleton-Based Gait Recognition

Abstract:Graph convolutional networks have been widely applied in skeleton-based gait recognition. A key challenge in this task is to distinguish the individual walking styles of different subjects across various views. Existing state-of-the-art methods employ uniform convolutions to extract features from diverse sequences and ignore the effects of viewpoint changes. To overcome these limitations, we propose a condition-adaptive graph (CAG) convolution network that can dynamically adapt to the specific attributes of each skeleton sequence and the corresponding view angle. In contrast to using fixed weights for all joints and sequences, we introduce a joint-specific filter learning (JSFL) module in the CAG method, which produces sequence-adaptive filters at the joint level. The adaptive filters capture fine-grained patterns that are unique to each joint, enabling the extraction of diverse spatial-temporal information about body parts. Additionally, we design a view-adaptive topology learning (VATL) module that generates adaptive graph topologies. These graph topologies are used to correlate the joints adaptively according to the specific view conditions. Thus, CAG can simultaneously adjust to various walking styles and viewpoints. Experiments on the two most widely used datasets (i.e., CASIA-B and OU-MVLP) show that CAG surpasses all previous skeleton-based methods. Moreover, the recognition performance can be enhanced by simply combining CAG with appearance-based methods, demonstrating the ability of CAG to provide useful complementary information.The source code will be available at https://github.com/OliverHxh/CAG.

* Accepted by TIP journal

Via

Access Paper or Ask Questions

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Jun 01, 2023

Haijun Xiong, Yunze Deng, Xiaohu Huang, Xinggang Wang, Wenyu Liu, Bin Feng

Figure 1 for GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Figure 2 for GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Figure 3 for GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Figure 4 for GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Abstract:Gait recognition is an emerging biological recognition technology that identifies and verifies individuals based on their walking patterns. However, many current methods are limited in their use of temporal information. In order to fully harness the potential of gait recognition, it is crucial to consider temporal features at various granularities and spans. Hence, in this paper, we propose a novel framework named GaitGS, which aggregates temporal features in the granularity dimension and span dimension simultaneously. Specifically, Multi-Granularity Feature Extractor (MGFE) is proposed to focus on capturing the micro-motion and macro-motion information at the frame level and unit level respectively. Moreover, we present Multi-Span Feature Learning (MSFL) module to generate global and local temporal representations. On three popular gait datasets, extensive experiments demonstrate the state-of-the-art performance of our method. Our method achieves the Rank-1 accuracies of 92.9% (+0.5%), 52.0% (+1.4%), and 97.5% (+0.8%) on CASIA-B, GREW, and OU-MVLP respectively. The source code will be released soon.

* 14 pages, 6 figures

Via

Access Paper or Ask Questions

Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Mar 29, 2023

Bin Feng, Tenglong Ao, Zequn Liu, Wei Ju, Libin Liu, Ming Zhang

Figure 1 for Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Figure 2 for Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Figure 3 for Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Figure 4 for Robust Dancer: Long-term 3D Dance Synthesis Using Unpaired Data

Abstract:How to automatically synthesize natural-looking dance movements based on a piece of music is an incrementally popular yet challenging task. Most existing data-driven approaches require hard-to-get paired training data and fail to generate long sequences of motion due to error accumulation of autoregressive structure. We present a novel 3D dance synthesis system that only needs unpaired data for training and could generate realistic long-term motions at the same time. For the unpaired data training, we explore the disentanglement of beat and style, and propose a Transformer-based model free of reliance upon paired data. For the synthesis of long-term motions, we devise a new long-history attention strategy. It first queries the long-history embedding through an attention computation and then explicitly fuses this embedding into the generation pipeline via multimodal adaptation gate (MAG). Objective and subjective evaluations show that our results are comparable to strong baseline methods, despite not requiring paired training data, and are robust when inferring long-term music. To our best knowledge, we are the first to achieve unpaired data training - an ability that enables to alleviate data limitations effectively. Our code is released on https://github.com/BFeng14/RobustDancer

* Preliminary video demo: https://youtu.be/gJbxG9QlcUU

Via

Access Paper or Ask Questions

Graph Contrastive Learning for Skeleton-based Action Recognition

Jan 26, 2023

Xiaohu Huang, Hao Zhou, Bin Feng, Xinggang Wang, Wenyu Liu, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jingdong Wang

Figure 1 for Graph Contrastive Learning for Skeleton-based Action Recognition

Figure 2 for Graph Contrastive Learning for Skeleton-based Action Recognition

Figure 3 for Graph Contrastive Learning for Skeleton-based Action Recognition

Figure 4 for Graph Contrastive Learning for Skeleton-based Action Recognition

Abstract:In the field of skeleton-based action recognition, current top-performing graph convolutional networks (GCNs) exploit intra-sequence context to construct adaptive graphs for feature aggregation. However, we argue that such context is still \textit{local} since the rich cross-sequence relations have not been explicitly investigated. In this paper, we propose a graph contrastive learning framework for skeleton-based action recognition (\textit{SkeletonGCL}) to explore the \textit{global} context across all sequences. In specific, SkeletonGCL associates graph learning across sequences by enforcing graphs to be class-discriminative, \emph{i.e.,} intra-class compact and inter-class dispersed, which improves the GCN capacity to distinguish various action patterns. Besides, two memory banks are designed to enrich cross-sequence context from two complementary levels, \emph{i.e.,} instance and semantic levels, enabling graph contrastive learning in multiple context scales. Consequently, SkeletonGCL establishes a new training paradigm, and it can be seamlessly incorporated into current GCNs. Without loss of generality, we combine SkeletonGCL with three GCNs (2S-ACGN, CTR-GCN, and InfoGCN), and achieve consistent improvements on NTU60, NTU120, and NW-UCLA benchmarks. The source code will be available at \url{https://github.com/OliverHxh/SkeletonGCL}.

* Accepted by ICLR2023

Via

Access Paper or Ask Questions

Context-Sensitive Temporal Feature Learning for Gait Recognition

Apr 08, 2022

Xiaohu Huang, Duowang Zhu, Xinggang Wang, Hao Wang, Bo Yang, Botao He, Wenyu Liu, Bin Feng

Figure 1 for Context-Sensitive Temporal Feature Learning for Gait Recognition

Figure 2 for Context-Sensitive Temporal Feature Learning for Gait Recognition

Figure 3 for Context-Sensitive Temporal Feature Learning for Gait Recognition

Figure 4 for Context-Sensitive Temporal Feature Learning for Gait Recognition

Abstract:Although gait recognition has drawn increasing research attention recently, it remains challenging to learn discriminative temporal representation, since the silhouette differences are quite subtle in spatial domain. Inspired by the observation that human can distinguish gaits of different subjects by adaptively focusing on temporal clips with different time scales, we propose a context-sensitive temporal feature learning (CSTL) network for gait recognition. CSTL produces temporal features in three scales, and adaptively aggregates them according to the contextual information from local and global perspectives. Specifically, CSTL contains an adaptive temporal aggregation module that subsequently performs local relation modeling and global relation modeling to fuse the multi-scale features. Besides, in order to remedy the spatial feature corruption caused by temporal operations, CSTL incorporates a salient spatial feature learning (SSFL) module to select groups of discriminative spatial features. Particularly, we utilize transformers to implement the global relation modeling and the SSFL module. To the best of our knowledge, this is the first work that adopts transformer in gait recognition. Extensive experiments conducted on three datasets demonstrate the state-of-the-art performance. Concretely, we achieve rank-1 accuracies of 98.7%, 96.2% and 88.7% under normal-walking, bag-carrying and coat-wearing conditions on CASIA-B, 97.5% on OU-MVLP and 50.6% on GREW.

* Submitted to TPAMI

Via

Access Paper or Ask Questions

Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

Dec 15, 2021

Shaowei Jiang, Chengfei Guo, Zichao Bian, Ruihai Wang, Jiakai Zhu, Pengming Song, Patrick Hu, Derek Hu, Zibang Zhang, Kazunori Hoshino(+2 more)

Figure 1 for Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

Figure 2 for Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

Figure 3 for Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

Figure 4 for Ptychographic sensor for large-scale lensless microbial monitoring with high spatiotemporal resolution

Abstract:Traditional microbial detection methods often rely on the overall property of microbial cultures and cannot resolve individual growth event at high spatiotemporal resolution. As a result, they require bacteria to grow to confluence and then interpret the results. Here, we demonstrate the application of an integrated ptychographic sensor for lensless cytometric analysis of microbial cultures over a large scale and with high spatiotemporal resolution. The reported device can be placed within a regular incubator or used as a standalone incubating unit for long-term microbial monitoring. For longitudinal study where massive data are acquired at sequential time points, we report a new temporal-similarity constraint to increase the temporal resolution of ptychographic reconstruction by 7-fold. With this strategy, the reported device achieves a centimeter-scale field of view, a half-pitch spatial resolution of 488 nm, and a temporal resolution of 15-second intervals. For the first time, we report the direct observation of bacterial growth in a 15-second interval by tracking the phase wraps of the recovered images, with high phase sensitivity like that in interferometric measurements. We also characterize cell growth via longitudinal dry mass measurement and perform rapid bacterial detection at low concentrations. For drug-screening application, we demonstrate proof-of-concept antibiotic susceptibility testing and perform single-cell analysis of antibiotic-induced filamentation. The combination of high phase sensitivity, high spatiotemporal resolution, and large field of view is unique among existing microscopy techniques. As a quantitative and miniaturized platform, it can improve studies with microorganisms and other biospecimens at resource-limited settings.

* 18 pages, 6 figures

Via

Access Paper or Ask Questions