Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bingbing Ni

Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering

Aug 12, 2024

Jiameng Li, Yue Shi, Jiezhang Cao, Bingbing Ni, Wenjun Zhang, Kai Zhang, Luc Van Gool

Abstract:3D Gaussian Splatting (3DGS) has attracted great attention in novel view synthesis because of its superior rendering efficiency and high fidelity. However, the trained Gaussians suffer from severe zooming degradation due to non-adjustable representation derived from single-scale training. Though some methods attempt to tackle this problem via post-processing techniques such as selective rendering or filtering techniques towards primitives, the scale-specific information is not involved in Gaussians. In this paper, we propose a unified optimization method to make Gaussians adaptive for arbitrary scales by self-adjusting the primitive properties (e.g., color, shape and size) and distribution (e.g., position). Inspired by the mipmap technique, we design pseudo ground-truth for the target scale and propose a scale-consistency guidance loss to inject scale information into 3D Gaussians. Our method is a plug-in module, applicable for any 3DGS models to solve the zoom-in and zoom-out aliasing. Extensive experiments demonstrate the effectiveness of our method. Notably, our method outperforms 3DGS in PSNR by an average of 9.25 dB for zoom-in and 10.40 dB for zoom-out on the NeRF Synthetic dataset.

* 9 pages

Via

Access Paper or Ask Questions

AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Jul 16, 2024

Qiaoqiao Jin, Rui Shi, Yishun Dou, Bingbing Ni

Figure 1 for AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Figure 2 for AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Figure 3 for AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Figure 4 for AU-vMAE: Knowledge-Guide Action Units Detection via Video Masked Autoencoder

Abstract:Current Facial Action Unit (FAU) detection methods generally encounter difficulties due to the scarcity of labeled video training data and the limited number of training face IDs, which renders the trained feature extractor insufficient coverage for modeling the large diversity of inter-person facial structures and movements. To explicitly address the above challenges, we propose a novel video-level pre-training scheme by fully exploring the multi-label property of FAUs in the video as well as the temporal label consistency. At the heart of our design is a pre-trained video feature extractor based on the video-masked autoencoder together with a fine-tuning network that jointly completes the multi-level video FAUs analysis tasks, \emph{i.e.} integrating both video-level and frame-level FAU detections, thus dramatically expanding the supervision set from sparse FAUs annotations to ALL video frames including masked ones. Moreover, we utilize inter-frame and intra-frame AU pair state matrices as prior knowledge to guide network training instead of traditional Graph Neural Networks, for better temporal supervision. Our approach demonstrates substantial enhancement in performance compared to the existing state-of-the-art methods used in BP4D and DISFA FAUs datasets.

Via

Access Paper or Ask Questions

Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

Jun 29, 2024

Yangzhou Jiang, Yinxin Lin, Yaoming Wang, Teng Li, Bilian Ke, Bingbing Ni

Figure 1 for Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

Figure 2 for Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

Figure 3 for Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

Figure 4 for Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck

Abstract:Appearance-based supervised methods with full-face image input have made tremendous advances in recent gaze estimation tasks. However, intensive human annotation requirement inhibits current methods from achieving industrial level accuracy and robustness. Although current unsupervised pre-training frameworks have achieved success in many image recognition tasks, due to the deep coupling between facial and eye features, such frameworks are still deficient in extracting useful gaze features from full-face. To alleviate above limitations, this work proposes a novel unsupervised/self-supervised gaze pre-training framework, which forces the full-face branch to learn a low dimensional gaze embedding without gaze annotations, through collaborative feature contrast and squeeze modules. In the heart of this framework is an alternating eye-attended/unattended masking training scheme, which squeezes gaze-related information from full-face branch into an eye-masked auto-encoder through an injection bottleneck design that successfully encourages the model to pays more attention to gaze direction rather than facial textures only, while still adopting the eye self-reconstruction objective. In the same time, a novel eye/gaze-related information contrastive loss has been designed to further boost the learned representation by forcing the model to focus on eye-centered regions. Extensive experimental results on several gaze benchmarks demonstrate that the proposed scheme achieves superior performances over unsupervised state-of-the-art.

* 12 pages, 6 figures, 7 tables

Via

Access Paper or Ask Questions

AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

May 28, 2024

Yuhan Li, Hao Zhou, Wenxiang Shang, Ran Lin, Xuanhong Chen, Bingbing Ni

Figure 1 for AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

Figure 2 for AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

Figure 3 for AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

Figure 4 for AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario

Abstract:While image-based virtual try-on has made significant strides, emerging approaches still fall short of delivering high-fidelity and robust fitting images across various scenarios, as their models suffer from issues of ill-fitted garment styles and quality degrading during the training process, not to mention the lack of support for various combinations of attire. Therefore, we first propose a lightweight, scalable, operator known as Hydra Block for attire combinations. This is achieved through a parallel attention mechanism that facilitates the feature injection of multiple garments from conditionally encoded branches into the main network. Secondly, to significantly enhance the model's robustness and expressiveness in real-world scenarios, we evolve its potential across diverse settings by synthesizing the residuals of multiple models, as well as implementing a mask region boost strategy to overcome the instability caused by information leakage in existing models. Equipped with the above design, AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap, excelling in producing well-fitting garments replete with photorealistic and rich details. Furthermore, AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.

* Project website: https://colorful-liyu.github.io/anyfit-page/

Via

Access Paper or Ask Questions

Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Apr 09, 2024

Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin, Ying Chen, Rui Shi, Yucheng Zheng, Yupeng Zhu, Bingbing Ni

Figure 1 for Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Figure 2 for Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Figure 3 for Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Figure 4 for Toward Tiny and High-quality Facial Makeup with Data Amplify Learning

Abstract:Contemporary makeup approaches primarily hinge on unpaired learning paradigms, yet they grapple with the challenges of inaccurate supervision (e.g., face misalignment) and sophisticated facial prompts (including face parsing, and landmark detection). These challenges prohibit low-cost deployment of facial makeup models, especially on mobile devices. To solve above problems, we propose a brand-new learning paradigm, termed "Data Amplify Learning (DAL)," alongside a compact makeup model named "TinyBeauty." The core idea of DAL lies in employing a Diffusion-based Data Amplifier (DDA) to "amplify" limited images for the model training, thereby enabling accurate pixel-to-pixel supervision with merely a handful of annotations. Two pivotal innovations in DDA facilitate the above training approach: (1) A Residual Diffusion Model (RDM) is designed to generate high-fidelity detail and circumvent the detail vanishing problem in the vanilla diffusion models; (2) A Fine-Grained Makeup Module (FGMM) is proposed to achieve precise makeup control and combination while retaining face identity. Coupled with DAL, TinyBeauty necessitates merely 80K parameters to achieve a state-of-the-art performance without intricate face prompts. Meanwhile, TinyBeauty achieves a remarkable inference speed of up to 460 fps on the iPhone 13. Extensive experiments show that DAL can produce highly competitive makeup models using only 5 image pairs.

Via

Access Paper or Ask Questions

Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

Feb 14, 2024

Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai(+15 more)

Abstract:Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmark dataset of over 5,000 rib fractures from 660 CT scans, with voxel-level instance mask annotations and diagnosis labels for four clinical categories (buckle, nondisplaced, displaced, or segmental). The challenge includes two tracks: a detection (instance segmentation) track evaluated by an FROC-style metric and a classification track evaluated by an F1-style metric. During the MICCAI 2020 challenge period, 243 results were evaluated, and seven teams were invited to participate in the challenge summary. The analysis revealed that several top rib fracture detection solutions achieved performance comparable or even better than human experts. Nevertheless, the current rib fracture classification solutions are hardly clinically applicable, which can be an interesting area in the future. As an active benchmark and research resource, the data and online evaluation of the RibFrac Challenge are available at the challenge website. As an independent contribution, we have also extended our previous internal baseline by incorporating recent advancements in large-scale pretrained networks and point-based rib segmentation techniques. The resulting FracNet+ demonstrates competitive performance in rib fracture detection, which lays a foundation for further research and development in AI-assisted rib fracture detection and diagnosis.

* Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

Via

Access Paper or Ask Questions

Real-Time Neural BRDF with Spherically Distributed Primitives

Oct 12, 2023

Yishun Dou, Zhong Zheng, Qiaoqiao Jin, Bingbing Ni, Yugang Chen, Junxiang Ke

Abstract:We propose a novel compact and efficient neural BRDF offering highly versatile material representation, yet with very-light memory and neural computation consumption towards achieving real-time rendering. The results in Figure 1, rendered at full HD resolution on a current desktop machine, show that our system achieves real-time rendering with a wide variety of appearances, which is approached by the following two designs. On the one hand, noting that bidirectional reflectance is distributed in a very sparse high-dimensional subspace, we propose to project the BRDF into two low-dimensional components, i.e., two hemisphere feature-grids for incoming and outgoing directions, respectively. On the other hand, learnable neural reflectance primitives are distributed on our highly-tailored spherical surface grid, which offer informative features for each component and alleviate the conventional heavy feature learning network to a much smaller one, leading to very fast evaluation. These primitives are centrally stored in a codebook and can be shared across multiple grids and even across materials, based on the low-cost indices stored in material-specific spherical surface grids. Our neural BRDF, which is agnostic to the material, provides a unified framework that can represent a variety of materials in consistent manner. Comprehensive experimental results on measured BRDF compression, Monte Carlo simulated BRDF acceleration, and extension to spatially varying effect demonstrate the superior quality and generalizability achieved by the proposed scheme.

Via

Access Paper or Ask Questions

SieveNet: Selecting Point-Based Features for Mesh Networks

Aug 24, 2023

Shengchao Yuan, Yishun Dou, Rui Shi, Bingbing Ni, Zhong Zheng

Figure 1 for SieveNet: Selecting Point-Based Features for Mesh Networks

Figure 2 for SieveNet: Selecting Point-Based Features for Mesh Networks

Figure 3 for SieveNet: Selecting Point-Based Features for Mesh Networks

Figure 4 for SieveNet: Selecting Point-Based Features for Mesh Networks

Abstract:Meshes are widely used in 3D computer vision and graphics, but their irregular topology poses challenges in applying them to existing neural network architectures. Recent advances in mesh neural networks turn to remeshing and push the boundary of pioneer methods that solely take the raw meshes as input. Although the remeshing offers a regular topology that significantly facilitates the design of mesh network architectures, features extracted from such remeshed proxies may struggle to retain the underlying geometry faithfully, limiting the subsequent neural network's capacity. To address this issue, we propose SieveNet, a novel paradigm that takes into account both the regular topology and the exact geometry. Specifically, this method utilizes structured mesh topology from remeshing and accurate geometric information from distortion-aware point sampling on the surface of the original mesh. Furthermore, our method eliminates the need for hand-crafted feature engineering and can leverage off-the-shelf network architectures such as the vision transformer. Comprehensive experimental results on classification and segmentation tasks well demonstrate the effectiveness and superiority of our method.

* The project homepage is https://sievenet.github.io/

Via

Access Paper or Ask Questions

FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

Aug 22, 2023

Yuhan Li, Yishun Dou, Yue Shi, Yu Lei, Xuanhong Chen, Yi Zhang, Peng Zhou, Bingbing Ni

Figure 1 for FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

Figure 2 for FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

Figure 3 for FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

Figure 4 for FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

Abstract:While text-3D editing has made significant strides in leveraging score distillation sampling, emerging approaches still fall short in delivering separable, precise and consistent outcomes that are vital to content creation. In response, we introduce FocalDreamer, a framework that merges base shape with editable parts according to text prompts for fine-grained editing within desired regions. Specifically, equipped with geometry union and dual-path rendering, FocalDreamer assembles independent 3D parts into a complete object, tailored for convenient instance reuse and part-wise control. We propose geometric focal loss and style consistency regularization, which encourage focal fusion and congruent overall appearance. Furthermore, FocalDreamer generates high-fidelity geometry and PBR textures which are compatible with widely-used graphics engines. Extensive experiments have highlighted the superior editing capabilities of FocalDreamer in both quantitative and qualitative evaluations.

* Project website: https://focaldreamer.github.io

Via

Access Paper or Ask Questions

Omni Aggregation Networks for Lightweight Image Super-Resolution

Apr 24, 2023

Hang Wang, Xuanhong Chen, Bingbing Ni, Yutian Liu, Jinfan Liu

Figure 1 for Omni Aggregation Networks for Lightweight Image Super-Resolution

Figure 2 for Omni Aggregation Networks for Lightweight Image Super-Resolution

Figure 3 for Omni Aggregation Networks for Lightweight Image Super-Resolution

Figure 4 for Omni Aggregation Networks for Lightweight Image Super-Resolution

Abstract:While lightweight ViT framework has made tremendous progress in image super-resolution, its uni-dimensional self-attention modeling, as well as homogeneous aggregation scheme, limit its effective receptive field (ERF) to include more comprehensive interactions from both spatial and channel dimensions. To tackle these drawbacks, this work proposes two enhanced components under a new Omni-SR architecture. First, an Omni Self-Attention (OSA) block is proposed based on dense interaction principle, which can simultaneously model pixel-interaction from both spatial and channel dimensions, mining the potential correlations across omni-axis (i.e., spatial and channel). Coupling with mainstream window partitioning strategies, OSA can achieve superior performance with compelling computational budgets. Second, a multi-scale interaction scheme is proposed to mitigate sub-optimal ERF (i.e., premature saturation) in shallow models, which facilitates local propagation and meso-/global-scale interactions, rendering an omni-scale aggregation building block. Extensive experiments demonstrate that Omni-SR achieves record-high performance on lightweight super-resolution benchmarks (e.g., 26.95 dB@Urban100 $\times 4$ with only 792K parameters). Our code is available at \url{https://github.com/Francis0625/Omni-SR}.

* Accepted by CVPR2023. Code is available at \url{https://github.com/Francis0625/Omni-SR}

Via

Access Paper or Ask Questions